RESEARCH27
Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models
arXiv CS.LGΒ·May 15, 2026
This paper introduces TraFL, a novel post-training approach for diffusion language models that addresses "trajectory locking" observed in reward-maximizing methods. TraFL, a trajectory-balance objective, outperforms other methods across mathematical reasoning and code generation benchmarks.
Read original β