post-training

4 items

ARTICLE↑ trendingReddit r/MachineLearning·4/10/2026

Started a video series on building an orchestration layer for LLM post-training [P]

O autor iniciou uma série de vídeos sobre a construção de uma camada de orquestração para o pós-treinamento de LLMs. Ele descreve seus esforços para melhorar o framework `verl` para treinamento RL em escala, focando na modernização de pacotes e remoção de dependências irrelevantes.

reinforcement learning post-training orchestration frameworks

RESEARCHarXiv CS.AI·4/17/2026

GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

This work introduces Group Fine-Tuning (GFT), a unified post-training framework for large language models. It addresses intrinsic limitations of supervised fine-tuning (SFT), such as single-path dependency and entropy collapse, through Group Advantage Learning and Dynamic Coefficient Rectification.

LLMs reinforcement learning post-training machine learning

RESEARCHarXiv CS.CL·4/15/2026

Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision

Self-Distillation Zero (SD-Zero) is a novel post-training method designed to be more training sample-efficient than traditional reinforcement learning, without requiring external teachers or high-quality demonstrations. It operates by having a single model act as both a Generator and a Reviser, using the Reviser's improved responses and token distributions to provide dense supervision for the Generator through on-policy self-distillation.

reinforcement learning post-training Dense Supervision Self-Distillation

RESEARCHarXiv CS.AI·28d ago

On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective

This research proposes distinguishing between capability elicitation and capability creation in large language model post-training. It argues that elicitation reweights existing behaviors within a model's accessible support, while creation changes that support itself, developing this through a free-energy view.

LLMs AI capabilities Machine Learning Theory learning