Next-token prediction — AI articles, news & research

RESEARCHarXiv CS.LG·4/15/2026

How Transformers Learn to Plan via Multi-Token Prediction

This paper investigates how Multi-token prediction (MTP) enables Transformers to learn to plan, outperforming standard Next-token prediction (NTP). Empirically, MTP consistently improves performance on reasoning tasks, and theoretically, it induces a two-stage reverse reasoning process via gradient decoupling.

Next-token prediction Planning Multi-Token Prediction Reasoning