RESEARCH27
WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models
arXiv CS.CLΒ·April 13, 2026
WAND introduces a framework to adapt pretrained autoregressive text-to-speech (AR-TTS) models for constant computational and memory complexity. It achieves this by separating attention into global and local sliding-window mechanisms, employing curriculum learning, and utilizing knowledge distillation to maintain high-fidelity speech synthesis with significant KV cache memory reduction.
Knowledge DistillationAutoregressive Text-to-SpeechAttention MechanismComputational EfficiencyMemory Reduction
Read original β