← heapsort
RESEARCH28

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference

arXiv CS.CLΒ·April 23, 2026

TTKV proposes a temporal-tiered KV cache management framework for LLMs, inspired by human memory, to address the linear scaling of KV cache memory. It partitions the cache into tiers with heterogeneous capacity and precision, assigning more recent KV states to faster, higher-precision tiers.

Read original β†—