RESEARCHarXiv CS.CL·4/15/2026
LoSA: Locality Aware Sparse Attention for Block-Wise Diffusion Language Models
LoSA introduces Locality Aware Sparse Attention to address memory-bound attention and the KV Inflation problem in block-wise diffusion language models, especially for long contexts. It optimizes performance by reusing cached attention for stable tokens and applying sparse attention only to active tokens, significantly reducing KV index loading.
27