← heapsort-ai

Distributionally Robust Optimization

1 items

RESEARCHarXiv CS.LG·4/13/2026

Distributionally Robust Token Optimization in RLHF

To address LLMs' susceptibility to failures from small prompt shifts, especially in multi-step reasoning, researchers propose Distributionally Robust Token Optimization (DRTO). This approach combines token-level Reinforcement Learning from Human Feedback (RLHF) with Distributionally Robust Optimization (DRO) to enhance consistency under distribution shifts, showing improvements on mathematical reasoning benchmarks.

27