RESEARCH27

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

arXiv CS.AI·May 13, 2026

On-policy distillation (OPD) and self-distillation (OPSD) are promising post-training methods for large language models, but their effectiveness is inconsistent. This research empirically investigates their successes and failures, identifying sensitivities to teacher choice and issues with privileged information.

LLMs distillation learning machine learning AI Reasoning

Read original ↗