RESEARCH27
The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes
arXiv CS.AIΒ·May 13, 2026
On-policy distillation (OPD) and self-distillation (OPSD) are promising post-training methods for large language models, but their effectiveness is inconsistent. This research empirically investigates their successes and failures, identifying sensitivities to teacher choice and issues with privileged information.
Read original β