Learning Library

← Back to Papers
Research Paper

Central Committee Governs Routing in MoE Models

Authors: Yan Wang,
Organization: Hugging Face
Published: 2026-01-09 • Added: 2026-01-09

Key Insights

  • Across diverse domains and architectures, a tiny, fixed subset of experts (the “standing committee”) receives the majority of routing votes, contradicting the expected domain‑specific specialization.
  • This committee forms early in training, remains stable throughout fine‑tuning, and its dominance is largely independent of model size or the number of experts.
  • Domain‑specific experts do exist, but they contribute marginally and are often “shadowed” by the committee due to routing bias introduced by initialization and capacity constraints.
  • Removing or re‑weighting the committee during training leads to more balanced expert utilization and can improve downstream task performance on out‑of‑distribution data.

Abstract

Research challenges the assumption of domain specialization in Mixture of Experts models by identifying a persistent central committee of experts that dominates routing behavior across different domains and architectures.

Full Analysis

# Central Committee Governs Routing in MoE Models **Authors:** Yan Wang, **Source:** [HuggingFace](https://huggingface.co/papers/2601.03425) | [arXiv](https://arxiv.org/abs/2601.03425) **Published:** 2026-01-09 **Organization:** Hugging Face ## Summary - Across diverse domains and architectures, a tiny, fixed subset of experts (the “standing committee”) receives the majority of routing votes, contradicting the expected domain‑specific specialization. - This committee forms early in training, remains stable throughout fine‑tuning, and its dominance is largely independent of model size or the number of experts. - Domain‑specific experts do exist, but they contribute marginally and are often “shadowed” by the committee due to routing bias introduced by initialization and capacity constraints. - Removing or re‑weighting the committee during training leads to more balanced expert utilization and can improve downstream task performance on out‑of‑distribution data. ## Abstract Research challenges the assumption of domain specialization in Mixture of Experts models by identifying a persistent central committee of experts that dominates routing behavior across different domains and architectures. --- *Topics: ai-ml* *Difficulty: advanced* *Upvotes: 8*