RESEARCHarXiv CS.LG·5/6/2026
eOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and Quantization
eOptShrinkQ is a two-stage compression pipeline for KV cache in transformer attention heads. It leverages optimal singular value shrinkage and per-vector scalar quantization, grounded in random matrix theory, to achieve near-lossless compression and improve reconstruction.
29
