RESEARCH29

eOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and Quantization

arXiv CS.LG·May 6, 2026

eOptShrinkQ is a two-stage compression pipeline for KV cache in transformer attention heads. It leverages optimal singular value shrinkage and per-vector scalar quantization, grounded in random matrix theory, to achieve near-lossless compression and improve reconstruction.

quantization Random matrix theory AI compression KV cache Transformers

Read original ↗