← heapsort-ai

Neural network training

1 items

RESEARCH↑ trendingReddit r/MachineLearning·5/4/2026

Why SSMs struggle in parameter-constrained training: empirical findings at 25M parameters [R]

This post details empirical findings from OpenAI's Parameter Golf competition, explaining why State Space Models (SSMs) are structurally disadvantaged compared to transformers in parameter- and time-constrained training regimes. Key issues include worse in_proj weight compression for SSMs and architectural win reversals at higher vocabulary sizes, alongside insights from Mamba-3 Triton kernel experiments.

42