RESEARCHarXiv CS.LG·4/23/2026
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
Expert Upcycling proposes a method to progressively expand Mixture-of-Experts (MoE) capacity in large language models during continued pre-training. It increases the number of experts via duplication and router extension to provide a warm initialization, aiming to reduce training costs while preserving per-token inference cost.
28