RESEARCH28

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

arXiv CS.LG·April 23, 2026

Expert Upcycling proposes a method to progressively expand Mixture-of-Experts (MoE) capacity in large language models during continued pre-training. It increases the number of experts via duplication and router extension to provide a warm initialization, aiming to reduce training costs while preserving per-token inference cost.

Model Architecture training-optimization large language models

Read original ↗