RESEARCHβ trending42
Why SSMs struggle in parameter-constrained training: empirical findings at 25M parameters [R]
Reddit r/MachineLearningΒ·May 4, 2026
This post details empirical findings from OpenAI's Parameter Golf competition, explaining why State Space Models (SSMs) are structurally disadvantaged compared to transformers in parameter- and time-constrained training regimes. Key issues include worse in_proj weight compression for SSMs and architectural win reversals at higher vocabulary sizes, alongside insights from Mamba-3 Triton kernel experiments.
Read original β