Distributionally Robust Optimization — AI articles, news & research

RESEARCHarXiv CS.LG·4/13/2026

Distributionally Robust Token Optimization in RLHF

To address LLMs' susceptibility to failures from small prompt shifts, especially in multi-step reasoning, researchers propose Distributionally Robust Token Optimization (DRTO). This approach combines token-level Reinforcement Learning from Human Feedback (RLHF) with Distributionally Robust Optimization (DRO) to enhance consistency under distribution shifts, showing improvements on mathematical reasoning benchmarks.

DRO LLMs RLHF Distributionally Robust Optimization