RESEARCH27

A Self-Attentive Meta-Optimizer with Group-Adaptive Learning Rates and Weight Decay

arXiv CS.LG·May 7, 2026

MetaAdamW is a novel optimizer that employs a self-attention mechanism to dynamically adjust per-group learning rates and weight decay, addressing the limitation of uniform hyperparameters in adaptive optimizers. Its attention module is trained via a meta-learning objective, integrating gradient alignment, loss decrease, and generalization gap.

Meta-Learning deep learning learning AI Research optimizers

Read original ↗