← heapsort-ai

Mixture of Experts

22 items

DOC↑ trendingReddit r/LocalLLaMA·27d ago

AIDC-AI/Ovis2.6-80B-A3B · Hugging Face

Ovis2.6-80B-A3B is introduced as the latest advancement in Multimodal Large Language Models (MLLMs), upgrading to a Mixture-of-Experts (MoE) architecture for superior multimodal performance at reduced serving costs. It also brings significant improvements in long-context and high-resolution understanding, visual reasoning, and information-dense document comprehension.

AIDC-AI/Ovis2.6-80B-A3B · Hugging Face
44
RESEARCHarXiv CS.AI·5/9/2026

ZAYA1-8B Technical Report

ZAYA1-8B is a reasoning-focused mixture-of-experts (MoE) model with 700M active parameters, outperforming DeepSeek-R1-0528 on math and coding benchmarks. It was trained from scratch for reasoning on an AMD platform and uses a four-stage RL cascade for post-training.

29
RESEARCHDEV.to AI·25d ago

Shared expert pool reduces parameters while maintaining performance

Conventional Mixture-of-Experts designs increase parameters linearly with depth by assigning private expert sets to each transformer layer. A new approach, UniPool, replaces this with a single, globally shared pool of experts that all routers draw from, significantly reducing the total expert parameter count while maintaining comparable predictive quality.

29
RESEARCHarXiv CS.LG·4/17/2026

Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations

Mixture-of-Experts (MoE) models are prone to hallucinations, particularly for long-tail knowledge, because static Top-k routing under-prioritizes specialist experts. Counterfactual Routing (CoR) is proposed as a training-free inference framework that uses perturbation analysis and CEI to dynamically shift computational resources, thereby awakening these dormant experts.

28
RESEARCHarXiv CS.LG·4/6/2026

LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning

O LiME (Lightweight Mixture of Experts) propõe uma nova abordagem para MoE-PEFT, utilizando modulação leve de um único módulo PEFT compartilhado em vez de adaptadores separados por especialista. Isso reduz significativamente os parâmetros, introduz roteamento de parâmetros zero e generaliza para qualquer método PEFT, superando as limitações de escalabilidade e aplicabilidade.

27
RESEARCHarXiv CS.AI·4/17/2026

Equifinality in Mixture of Experts: Routing Topology Does Not Determine Language Modeling Quality

This paper investigates whether routing topology actually determines language modeling quality in Mixture-of-Experts (MoE) architectures. The authors found that different routing variants, including a novel cosine-similarity based one, result in statistically equivalent asymptotic perplexity, suggesting that routing design has a smaller impact on final quality than previously thought.

27
RESEARCHarXiv CS.LG·5/7/2026

MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning

This research introduces MP-ISMoE, a Mixed-Precision Interactive Side Mixture-of-Experts framework, to enhance parameter-efficient transfer learning by mitigating memory overhead. It employs a Gaussian Noise Perturbed Iterative Quantization (GNP-IQ) scheme for lower-bit weight quantization, freeing up memory to improve side network learning capacity and performance.

27
RESEARCHarXiv CS.LG·5/6/2026

Agentic AI-Based Joint Computing and Networking via Mixture of Experts and Large Language Models

This paper proposes an agentic artificial intelligence (AI)-based network optimization framework that integrates mixture of experts (MoE) architectures with large language models (LLMs). The LLM acts as a semantic gate to reason over operator objectives and dynamically compose suitable optimization agents for 6G mobile networks.

27