RESEARCH27
Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning
arXiv CS.LGΒ·April 27, 2026
This research investigates the necessity of learned memory tokens as a computational scratchpad for Universal Transformers with Adaptive Computation Time (ACT) on a combinatorial reasoning benchmark, Sudoku-Extreme. It finds that memory tokens are empirically necessary for non-trivial performance, identifying a sharp lower threshold for optimal count and a common router initialization trap.
Read original β