← heapsort
RESEARCH27

How Transformers Learn to Plan via Multi-Token Prediction

arXiv CS.LGΒ·April 15, 2026

This paper investigates how Multi-token prediction (MTP) enables Transformers to learn to plan, outperforming standard Next-token prediction (NTP). Empirically, MTP consistently improves performance on reasoning tasks, and theoretically, it induces a two-stage reverse reasoning process via gradient decoupling.

Read original β†—