← heapsort
RESEARCH27

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

arXiv CS.AIΒ·May 13, 2026

On-policy distillation (OPD) and self-distillation (OPSD) are promising post-training methods for large language models, but their effectiveness is inconsistent. This research empirically investigates their successes and failures, identifying sensitivities to teacher choice and issues with privileged information.

Read original β†—