AI compression

2 items

RESEARCHarXiv CS.LG·5/6/2026

eOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and Quantization

eOptShrinkQ is a two-stage compression pipeline for KV cache in transformer attention heads. It leverages optimal singular value shrinkage and per-vector scalar quantization, grounded in random matrix theory, to achieve near-lossless compression and improve reconstruction.

quantization Random matrix theory AI compression KV cache

ARTICLEKDNuggets·25d ago

TurboQuant: Is the Compression and Performance Worth the Hype?

This content examines TurboQuant's claims regarding compression and performance, questioning its ability to boost efficiency without accuracy loss. It explores whether the technology truly lives up to its hype.

efficiency AI compression model optimization performance

TurboQuant: Is the Compression and Performance Worth the Hype?