RESEARCH27

Adaptive Computation Depth via Learned Token Routing in Transformers

arXiv CS.LG·May 8, 2026

This paper introduces Token-Selective Attention (TSA), a mechanism for Transformer architectures that enables adaptive computation depth per token. TSA learns to route tokens based on contextual difficulty, saving 14-23% of token-layer operations with minimal quality loss.

neural networks deep learning machine learning efficiency Transformers

Read original ↗