RESEARCH29

When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment

arXiv CS.AI·May 11, 2026

This research introduces a "finite-answer preference stabilization" method to determine when a language model's answer preference becomes stable before its final output. It shows that this stabilization often occurs before the answer is parseable, with a significant lead time.

language models cognitive science machine learning NLP AI Research

Read original ↗