RESEARCH28

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference

arXiv CS.CL·April 23, 2026

TTKV proposes a temporal-tiered KV cache management framework for LLMs, inspired by human memory, to address the linear scaling of KV cache memory. It partitions the cache into tiers with heterogeneous capacity and precision, assigning more recent KV states to faster, higher-precision tiers.

neural networks LLMs memory management Inference Optimization

Read original ↗