RESEARCH27
Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant
arXiv CS.LGΒ·May 12, 2026
This research analyzes three KV cache quantization schemes (KV, KQV, QKQV) and their impact on inner product variance, especially how QJL on K inflates it, amplified by softmax. Empirical findings highlight KQV's superior performance at a budget of n=4, an unconditional K-V asymmetry where QKQV is consistently worse than KQV in KL divergence, and budget-dependent crossovers for geometric K reconstruction.
Read original β