← heapsort
RESEARCH27

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding

arXiv CS.LGΒ·May 18, 2026

This paper introduces Group-Query Latent Attention (GQLA), a modification to Multi-head Latent Attention (MLA). GQLA exposes two algebraically equivalent decoding paths, allowing a single set of trained weights to adapt efficiently to different hardware platforms like H100 and H20 without retraining.

Read original β†—