RESEARCH29
Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing
arXiv CS.LGΒ·April 28, 2026
This work addresses the significant memory footprint of Key-Value (KV) caching in transformer language models, proposing optimization through the depth dimension. It introduces a method for cross-layer cache sharing, demonstrating that dropping a layer's cache can be efficient without information loss, and suggests a training approach with random cross-layer attention.
Read original β