RESEARCHarXiv CS.LG·4/20/2026
The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference
This research reveals that KV caching in autoregressive transformer inference, under standard FP16 precision, causes a systematic divergence in decoded token sequences due to different floating-point accumulation orders. Across LLaMA-2-7B, Mistral-7B, and Gemma-2-2B, a 100% token divergence rate was observed, with cache-ON often leading to higher accuracy.
27