← heapsort
RESEARCH55

Post-training is (Massive) Supervised Learning

arXiv CS.CLΒ·June 9, 2026

This paper argues that the prevailing post-training paradigm for LLMs, involving SFT and RL, effectively reverts to the "pre-train then fine-tune" approach, explicitly tailoring models to specific benchmarks. Empirical evidence shows that models post-trained from scratch can yield significant performance on reasoning datasets.

Read original β†—