RESEARCH27

Tensor Cache: Eviction-conditioned Associative Memory for Transformers

arXiv CS.LG·May 25, 2026

This paper introduces Tensor Cache, a two-level cache for Transformers designed to optimize KV caches. It pairs sliding-window softmax attention (L1) with a fixed-size outer-product fast-weight memory (L2) to manage evicted tokens, improving access to relevant evidence outside the context window.

Associative Memory deep learning AI Caching Transformers

Read original ↗