Post-training is (Massive) Supervised Learning
This paper argues that the prevailing post-training paradigm for LLMs, involving SFT and RL, effectively reverts to the "pre-train then fine-tune" approach, explicitly tailoring models to specific benchmarks. Empirical evidence shows that models post-trained from scratch can yield significant performance on reasoning datasets.

