RESEARCH28
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
arXiv CS.LGΒ·April 23, 2026
Expert Upcycling proposes a method to progressively expand Mixture-of-Experts (MoE) capacity in large language models during continued pre-training. It increases the number of experts via duplication and router extension to provide a warm initialization, aiming to reduce training costs while preserving per-token inference cost.
Read original β