RESEARCH28
TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
arXiv CS.CLΒ·April 23, 2026
TTKV proposes a temporal-tiered KV cache management framework for LLMs, inspired by human memory, to address the linear scaling of KV cache memory. It partitions the cache into tiers with heterogeneous capacity and precision, assigning more recent KV states to faster, higher-precision tiers.
Read original β