RESEARCH27

LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction

arXiv CS.LG·May 11, 2026

This paper introduces LKV (Learned KV Eviction), a novel approach to optimize Key-Value (KV) cache memory in Large Language Models (LLMs). LKV formulates KV compression as an end-to-end differentiable optimization problem, learning budgets and token selection to overcome limitations of heuristic methods.

deep learning Memory Optimization efficiency KV cache LLM

Read original ↗