RESEARCH27

LoSA: Locality Aware Sparse Attention for Block-Wise Diffusion Language Models

arXiv CS.CL·April 15, 2026

LoSA introduces Locality Aware Sparse Attention to address memory-bound attention and the KV Inflation problem in block-wise diffusion language models, especially for long contexts. It optimizes performance by reusing cached attention for stable tokens and applying sparse attention only to active tokens, significantly reducing KV index loading.

Memory Optimization Long Context KV Inflation sparse attention Diffusion Language Models

Read original ↗