RESEARCH27
TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization
arXiv CS.AIΒ·May 4, 2026
TUR-DPO is a novel topology- and uncertainty-aware variant of Direct Preference Optimization (DPO) designed to better align large language models (LLMs) with human preferences. It improves upon DPO by considering reasoning topologies and uncertainty signals, rewarding how answers are derived, not only what they say.
Read original β