← heapsort
RESEARCH29

Simply Stabilizing the Loop via Fully Looped Transformer

arXiv CS.LGΒ·May 20, 2026

Looped Transformers provide a way to improve model performance by iteratively reusing blocks without increasing parameter count, but they suffer from training instability at higher loop iterations. This instability is attributed to gradient oscillation and residual explosion, leading to the proposal of the Fully Looped Transformer, which introduces a Fully Looped Architecture and Attention Injection.

Read original β†—