Self-Distillation

4 items

RESEARCHarXiv CS.LG·6d ago

Self-Distilled Policy Gradient

This paper introduces Self-Distilled Policy Gradient (SDPG), a novel framework that enhances sparse-reward reinforcement learning through on-policy self-distillation. SDPG integrates group-relative verifier advantages, exact full-vocabulary self-distillation, and KL regularization, demonstrating improved stability and performance over existing baselines.

language models deep learning reinforcement learning Policy Gradient

RESEARCHarXiv CS.CL·4/15/2026

Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision

Self-Distillation Zero (SD-Zero) is a novel post-training method designed to be more training sample-efficient than traditional reinforcement learning, without requiring external teachers or high-quality demonstrations. It operates by having a single model act as both a Generator and a Reviser, using the Reviser's improved responses and token distributions to provide dense supervision for the Generator through on-policy self-distillation.

reinforcement learning post-training Dense Supervision Self-Distillation

RESEARCHarXiv CS.CL·15d ago

EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs

EchoDistill is an alignment-based self-distillation framework designed to make Audio Large Language Models (ALLMs) robust to real-world noise. It leverages a frozen clean-audio teacher to guide an inference-time noisy-audio student, optimizing responses via group-relative policy optimization and token-level consistency.

robustness Audio LLMs machine learning Self-Distillation

RESEARCHHugging Face (YouTube)·4/16/2026

Hugging Face Journal Club: Embarrassingly Simple Self-Distillation Improves Code Generation

This content from the Hugging Face Journal Club discusses an "embarrassingly simple" self-distillation method that significantly improves code generation. It highlights advancements in leveraging large language models for programming tasks.

machine learning code generation Self-Distillation large language models

Hugging Face Journal Club: Embarrassingly Simple Self-Distillation Improves Code Generation