RESEARCH28
Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation
arXiv CS.LGΒ·May 18, 2026
This paper introduces on-policy self-distillation (OPSA) to reduce the "safety tax" in LLM safety alignment. OPSA addresses the distributional mismatch of off-policy training by having the model generate its own rollouts and receive dense per-token KL supervision from a frozen teacher.
Read original β