RESEARCHDEV.to AI·25d ago
Shared expert pool reduces parameters while maintaining performance
Conventional Mixture-of-Experts designs increase parameters linearly with depth by assigning private expert sets to each transformer layer. A new approach, UniPool, replaces this with a single, globally shared pool of experts that all routers draw from, significantly reducing the total expert parameter count while maintaining comparable predictive quality.
29