← heapsort-ai

Attention Mechanism

8 items

ARTICLE↑ trendingReddit r/MachineLearning·4/11/2026

FlashAttention (FA1–FA4) in PyTorch - educational implementations focused on algorithmic differences [P]

An updated PyTorch repository features educational implementations of FlashAttention versions FA1 through FA4. The focus is on demonstrating the algorithmic differences and evolution of the method, facilitating an understanding of its design ideas without delving into hardware specifics.

45
RESEARCHarXiv CS.CL·4/13/2026

WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models

WAND introduces a framework to adapt pretrained autoregressive text-to-speech (AR-TTS) models for constant computational and memory complexity. It achieves this by separating attention into global and local sliding-window mechanisms, employing curriculum learning, and utilizing knowledge distillation to maintain high-fidelity speech synthesis with significant KV cache memory reduction.

27