RESEARCH27
A Self-Attentive Meta-Optimizer with Group-Adaptive Learning Rates and Weight Decay
arXiv CS.LGΒ·May 7, 2026
MetaAdamW is a novel optimizer that employs a self-attention mechanism to dynamically adjust per-group learning rates and weight decay, addressing the limitation of uniform hyperparameters in adaptive optimizers. Its attention module is trained via a meta-learning objective, integrating gradient alignment, loss decrease, and generalization gap.
Read original β