✨ Route Experts by Sequence, Not by Token ✨

Tiansheng Wen*^1,2, Yifei Wang*³, Aosong Feng⁴, Long Ma², Xinyang Liu⁵, Yifan Wang¹, Lixuan Guo²,
Bo Chen², Stefanie Jegelka^6,3, Chenyu You¹

¹Stony Brook University ²Xidian University ³MIT
⁴Yale University ⁵University of Texas at Austin ⁶TU Munich

🚀 News

2025-11-09: We uploaded our work to arXiv.

Usage Guide

NOTE : This repo is heavily built on top of Densemixer. We followed their training paradigm with minor modifications to the MoE routing process.

Currently, SeqTopK supports two MoE models: OLMoE (7B parameters) and Qwen1.5-MoE (14B parameters).

Getting started with OLMoE.

1. Installation

We construct experiments with open-instruct on OLMoE model.
Set up your environment as follows:

cd experiments/train/open-instruct
conda create -n openinstruct python=3.12
conda activate openinstruct
conda install -c conda-forge cuda-nvcc=12.1 -y
bash init_env.sh

To active Densemixer and SeqTopK, you need one additional setup：

conda activate openinstruct
pip install densemixer
densemixer setup

For convenience, we uses environment variables for configuration:

Variable	Description	Default
`DENSEMIXER_ENABLED`	Master switch (set to `1` to enable)	`0`
`DENSEMIXER_QWEN2`	Enable for Qwen1.5-MoE models	`1`
`DENSEMIXER_OLMOE`	Enable for OLMoE models	`1`
`DENSEMIXER_TOPK_MODE`	Selects the routing type (seq_topk for sequence-level, topk for token-level)	seq_topk

2. Data Preparation

We use the datasets from Deepseek's ESFT paper for both training and evaluation. Pre-processed datasets are available on Hugging Face: GSM (math reasoning), CodeAlpaca (code generation), ESFT-law (legal reasoning), ESFT-summary (summarization).

For code generation evaluation, we use MBPP and HumanEval.

3. Training

We support the following fine-tuning methods. For each method, replace {dataset_name} with your target dataset (gsm, codealpaca, law, summary).

cd experiments/train/open-instruct
export WANDB_API_KEY="YOUR_WANDB_API_KEY"
bash run/densemixer_full/train_{dataset_name}.sh

4. Evaluation

Evaluation scripts are in the eval/ directory. See eval/README.md for details and environment setup.

Getting started with Qwen1.5-MoE

1. Installation

We construct experiments with LLaMA-Factory on Qwen1.5-MoE.
Set up your environment as follows:

cd experiments/train/LLaMA-Factory
bash installation.sh

To active Densemixer and SeqTopK, you need one additional setup：

conda activate openinstruct
pip install densemixer
densemixer setup

2. Data Preparation

We use the GSM (math reasoning), CodeAlpaca (code generation), ESFT-law (legal reasoning), ESFT-summary (summarization). For code generation evaluation, we use MBPP and HumanEval.

3. Training

We support the following fine-tuning methods. For each method, replace {dataset_name} with your target dataset (gsm, codealpaca,law,summary).

cd experiments/train/LLaMA-Factory
export WANDB_API_KEY="YOUR_WANDB_API_KEY"
bash run/qwen1.5/densemixer_full/train_{dataset_name}_densemixer.sh

4. Evaluation

Evaluation scripts are in the eval/ directory. See eval/README.md for details and environment setup.

Citing this paper

If you find this work useful, please cite the accompanying paper:

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
asset		asset
densemixer		densemixer
experiments		experiments
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ Route Experts by Sequence, Not by Token ✨

🚀 News

Usage Guide

Getting started with OLMoE.

1. Installation

2. Data Preparation

3. Training

4. Evaluation

Getting started with Qwen1.5-MoE

1. Installation

2. Data Preparation

3. Training

4. Evaluation

Citing this paper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨ Route Experts by Sequence, Not by Token ✨

🚀 News

Usage Guide

Getting started with OLMoE.

1. Installation

2. Data Preparation

3. Training

4. Evaluation

Getting started with Qwen1.5-MoE

1. Installation

2. Data Preparation

3. Training

4. Evaluation

Citing this paper

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages