RESEARCH28
Unlocking Feature Learning in Gated Delta Networks at Scale
arXiv CS.LGΒ·June 4, 2026
This paper derives scaling rules for Gated Delta Networks to address the computational demands of training and scaling Large Language Models. Experiments validate that these configurations enable stable learning-rate transfer across various model widths, unlike standard parametrization.
Read original β