RESEARCH27
Adaptive Computation Depth via Learned Token Routing in Transformers
arXiv CS.LGΒ·May 8, 2026
This paper introduces Token-Selective Attention (TSA), a mechanism for Transformer architectures that enables adaptive computation depth per token. TSA learns to route tokens based on contextual difficulty, saving 14-23% of token-layer operations with minimal quality loss.
Read original β