RESEARCH27

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

arXiv CS.LG·May 15, 2026

This paper introduces TraFL, a novel post-training approach for diffusion language models that addresses "trajectory locking" observed in reward-maximizing methods. TraFL, a trajectory-balance objective, outperforms other methods across mathematical reasoning and code generation benchmarks.

Diffusion Models language models reinforcement learning machine learning AI research

Read original ↗