← heapsort
RESEARCH29

eOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and Quantization

arXiv CS.LGΒ·May 6, 2026

eOptShrinkQ is a two-stage compression pipeline for KV cache in transformer attention heads. It leverages optimal singular value shrinkage and per-vector scalar quantization, grounded in random matrix theory, to achieve near-lossless compression and improve reconstruction.

Read original β†—