KV Inflation — AI articles, news & research

RESEARCHarXiv CS.CL·4/15/2026

LoSA: Locality Aware Sparse Attention for Block-Wise Diffusion Language Models

LoSA introduces Locality Aware Sparse Attention to address memory-bound attention and the KV Inflation problem in block-wise diffusion language models, especially for long contexts. It optimizes performance by reusing cached attention for stable tokens and applying sparse attention only to active tokens, significantly reducing KV index loading.

Memory Optimization Long Context KV Inflation sparse attention