← heapsort
RESEARCH27

Adaptive Computation Depth via Learned Token Routing in Transformers

arXiv CS.LGΒ·May 8, 2026

This paper introduces Token-Selective Attention (TSA), a mechanism for Transformer architectures that enables adaptive computation depth per token. TSA learns to route tokens based on contextual difficulty, saving 14-23% of token-layer operations with minimal quality loss.

Read original β†—