RESEARCH27
Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention
arXiv CS.LGΒ·April 24, 2026
This paper introduces Gist Sparse Attention (GSA), an end-to-end learnable method to scale large language models to long contexts without architectural modifications. GSA compresses context into 'gist tokens' for summary, then selectively restores relevant raw chunks for detailed attention, combining compact global representations with targeted fine-grained access.
Read original β