training-optimization — AI articles, news & research

RESEARCHarXiv CS.LG·4/23/2026

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

Expert Upcycling proposes a method to progressively expand Mixture-of-Experts (MoE) capacity in large language models during continued pre-training. It increases the number of experts via duplication and router extension to provide a warm initialization, aiming to reduce training costs while preserving per-token inference cost.

Model Architecture training-optimization large language models