← heapsort
RESEARCH27

Calibrated Preference Learning: The Case of Label Ranking

arXiv CS.LGΒ·June 1, 2026

This paper formalizes calibration for probabilistic label ranking, introducing a hierarchy of notions for full, sub-ranking, and top-k calibration. Empirically, popular label ranking models are often poorly calibrated, with implications for RLHF reward models.

Read original β†—