Mixture of Experts

22 items

DOC↑ trendingReddit r/LocalLLaMA·27d ago

AIDC-AI/Ovis2.6-80B-A3B · Hugging Face

Ovis2.6-80B-A3B is introduced as the latest advancement in Multimodal Large Language Models (MLLMs), upgrading to a Mixture-of-Experts (MoE) architecture for superior multimodal performance at reduced serving costs. It also brings significant improvements in long-context and high-resolution understanding, visual reasoning, and information-dense document comprehension.

AI models multimodal AI Mixture of Experts large language models

ARTICLE↑ trendingHacker News (AI)·11d ago

Liquid AI reveals 8B-A1B MoE trained on 38T

Liquid AI has unveiled its new 8B-A1B MoE model, trained on 38 trillion tokens, representing a significant advancement in AI model development. This release showcases the company's progress in advanced AI architectures.

AI models Mixture of Experts large language models AI development

ARTICLEDEV.to AI·4/11/2026

A Review of Sparse Expert Models in Deep Learning

This content analyzes Sparse Expert Models in Deep Learning, a fundamental architecture for the scalability and efficiency of large neural networks. The review explores their application and impact in the field of advanced artificial intelligence.

neural networks deep learning Sparse Models AI Architectures

RESEARCHarXiv CS.AI·5/9/2026

ZAYA1-8B Technical Report

ZAYA1-8B is a reasoning-focused mixture-of-experts (MoE) model with 700M active parameters, outperforming DeepSeek-R1-0528 on math and coding benchmarks. It was trained from scratch for reasoning on an AMD platform and uses a four-stage RL cascade for post-training.

AI models AI training machine learning Benchmarking

RESEARCHDEV.to AI·25d ago

Shared expert pool reduces parameters while maintaining performance

Conventional Mixture-of-Experts designs increase parameters linearly with depth by assigning private expert sets to each transformer layer. A new approach, UniPool, replaces this with a single, globally shared pool of experts that all routers draw from, significantly reducing the total expert parameter count while maintaining comparable predictive quality.

Parameter efficiency Deep learning architecture AI optimization Mixture of Experts

ARTICLEDEV.to AI·5/9/2026

EMO Sparks AI Breakthrough with Pretraining Mixture of Experts

EMO introduces emergent modularity through a mixture of experts, significantly cutting AI training costs and enhancing model adaptability. This approach has the potential to reshape machine learning by making models more efficient and adaptable, opening doors for advances in transfer learning.

machine learning EMO Mixture of Experts AI

RESEARCHDEV.to AI·4/17/2026

Qwen3.6-35B-A3B Complete Review: Alibaba's Open-Source Coding Model That Beats Frontier Giants

Qwen3.6-35B-A3B is Alibaba's new open-source sparse Mixture-of-Experts (MoE) model, offering high efficiency for local deployment with 3B active parameters per token. Released under Apache 2.0, it outperforms dense 27B-param models and competes with frontier giants on coding benchmarks.

open-source AI Benchmarking coding AI Mixture of Experts

RESEARCHarXiv CS.LG·4/17/2026

Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations

Mixture-of-Experts (MoE) models are prone to hallucinations, particularly for long-tail knowledge, because static Top-k routing under-prioritizes specialist experts. Counterfactual Routing (CoR) is proposed as a training-free inference framework that uses perturbation analysis and CEI to dynamically shift computational resources, thereby awakening these dormant experts.

neural networks AI hallucinations deep learning Mixture of Experts

NEWSDEV.to AI·18d ago

Qwen3-Coder-Next: 80B total, 3B active, 70.6 on SWE-Bench

Qwen3-Coder-Next is an 80B total, 3B active sparse Mixture-of-Experts (MoE) model, achieving a score of 70.6 on SWE-Bench Verified. It features a hybrid attention mechanism and Apache 2.0 weights, being a coding-tuned variant of the Qwen3-Next-80B-A3B base.

Benchmarking code generation Mixture of Experts large language models

ARTICLEDEV.to AI·15d ago

GLM-4: The Chinese-English Bilingual Workhorse You Didn't Know You Needed

GLM-4 is a Chinese-English bilingual AI model from Tsinghua University / Zhipu AI, optimized from the ground up for both languages, unlike most English-centric models. It features a Mixture of Experts architecture for fast inference, long context up to 128K tokens, and a focus on function calling and agent workflows.

bilingual AI Function Calling Natural Language Processing Mixture of Experts

RESEARCHHugging Face Blog·5/8/2026

EMO: Pretraining mixture of experts for emergent modularity

EMO proposes a pretraining approach for Mixture of Experts (MoE) models, aiming to achieve emergent modularity. This method focuses on developing specialized components within the model during the pretraining phase.

Emergent Modularity AI models pretraining machine learning

RESEARCHarXiv CS.LG·4/6/2026

LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning

O LiME (Lightweight Mixture of Experts) propõe uma nova abordagem para MoE-PEFT, utilizando modulação leve de um único módulo PEFT compartilhado em vez de adaptadores separados por especialista. Isso reduz significativamente os parâmetros, introduz roteamento de parâmetros zero e generaliza para qualquer método PEFT, superando as limitações de escalabilidade e aplicabilidade.

multi-task learning model efficiency Deep Learning Architectures Mixture of Experts

RESEARCHarXiv CS.LG·19d ago

CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning

CP-MoE addresses catastrophic forgetting in continual learning for LLMs and VLMs using Mixture-of-Experts architectures. It introduces a transient expert and consistency-preserving routing to integrate new knowledge while preventing the overwriting of existing parameters.

LLMs VLMs learning Mixture of Experts

RESEARCHarXiv CS.AI·4/17/2026

Equifinality in Mixture of Experts: Routing Topology Does Not Determine Language Modeling Quality

This paper investigates whether routing topology actually determines language modeling quality in Mixture-of-Experts (MoE) architectures. The authors found that different routing variants, including a novel cosine-similarity based one, result in statistically equivalent asymptotic perplexity, suggesting that routing design has a smaller impact on final quality than previously thought.

neural networks routing algorithms Mixture of Experts Language modeling

RESEARCHarXiv CS.LG·5/7/2026

MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning

This research introduces MP-ISMoE, a Mixed-Precision Interactive Side Mixture-of-Experts framework, to enhance parameter-efficient transfer learning by mitigating memory overhead. It employs a Gaussian Noise Perturbed Iterative Quantization (GNP-IQ) scheme for lower-bit weight quantization, freeing up memory to improve side network learning capacity and performance.

model efficiency learning Transfer Learning quantization

RESEARCHarXiv CS.CL·27d ago

HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model

Hebatron is a Hebrew-specialized open-weight large language model built on NVIDIA's Nemotron-3 Mixture-of-Experts (MoE) architecture. It achieves a 73.8% Hebrew reasoning average, outperforming competitors and offering significantly higher inference throughput by activating fewer parameters per pass.

language models NVIDIA AI Hebrew AI Mixture of Experts

RESEARCHarXiv CS.LG·12d ago

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

This paper presents a survey addressing multimodal learning challenges with the Mixture-of-Experts (MoE) architecture. The study explores how MoE functions as an efficient engine and a representation learner for integrating diverse data modalities. It fills a gap in the literature by offering a comprehensive and systematic review on the topic.

multimodal learning Survey Mixture of Experts AI

ARTICLEDEV.to AI·4/14/2026

MiniMax M2 on OpenClaw: Setup, Pricing, and Performance...

The article describes MiniMax's M2 family of large language models, utilizing a Mixture of Experts architecture for high performance at low inference cost. The M2.7 model achieves 90% of frontier model quality at 7% of the cost, with benchmark results comparable to Claude Sonnet 4.

OpenClaw AI performance Mixture of Experts MiniMax M2

RESEARCHarXiv CS.LG·5/6/2026

Agentic AI-Based Joint Computing and Networking via Mixture of Experts and Large Language Models

This paper proposes an agentic artificial intelligence (AI)-based network optimization framework that integrates mixture of experts (MoE) architectures with large language models (LLMs). The LLM acts as a semantic gate to reason over operator objectives and dynamically compose suitable optimization agents for 6G mobile networks.

Network Optimization 6G Networks Agentic AI Mixture of Experts

DOCHugging Face (YouTube)·4/15/2026

What are Mixture-of-Experts Models | ft. Aritra

This content explains what Mixture-of-Experts (MoE) Models are, a neural network architecture that combines multiple 'experts' to process different parts of the data. The article, featuring Aritra, details how these models function and their applications in the field of artificial intelligence.

AI models machine learning Mixture of Experts

What are Mixture-of-Experts Models | ft. Aritra