RESEARCH29
Simply Stabilizing the Loop via Fully Looped Transformer
arXiv CS.LGΒ·May 20, 2026
Looped Transformers provide a way to improve model performance by iteratively reusing blocks without increasing parameter count, but they suffer from training instability at higher loop iterations. This instability is attributed to gradient oscillation and residual explosion, leading to the proposal of the Fully Looped Transformer, which introduces a Fully Looped Architecture and Attention Injection.
Read original β