Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning

Welcome to Omni-R1! 👋 This repository provides implementation code for "Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning".

We instantiate this paradigm with Omni-R1, a two-stage SFT+RL framework featuring perception alignment loss and perception reward, thereby enabling functional image generation. Additionally, we introduce Omni-R1-Zero, which eliminates the need for multimodal annotations by bootstrapping step-wise visualizations from text-only reasoning data.

🪐 Key Features

Important

Faster Evaluation & RL Rollouts with vLLM. Our evaluation and RL rollout pipelines (based on verl) are accelerated by vLLM, which can significantly reduce the inference time of large-scale sampling and long rollouts.

🧭 Two-stage training pipeline
PeSFT introduces perception alignment loss during SFT, and PeRPO applies a perception reward during RL to enhance functional image generation.

🌌 Two training regimes under different supervision
Omni-R1 uses multimodal interleaved supervision, while Omni-R1-Zero bootstraps step-wise visualizations from text-only reasoning data.

🧩 Benchmark
Includes Omni-Bench data and a vLLM-based evaluation script that runs inference efficiently and saves predictions in JSONL format.

🔥 News

[2026.01] Initial release of Omni-R1.

Roadmap

✅ Reproducibility essentials for Omni-R1 (core code, datasets, checkpoints)
✅ Paper link
✅ Omni-Bench (data + vLLM evaluation script)
⬜ Fully end-to-end PeRPO training framework
⬜ The implementation of bootstrapping step-wise visualizations

🚀 Quick Start

1. Installation

Create environment (For Inference & PeSFT)

python -m venv .venv
source .venv/bin/activate

pip install -U pip
pip install -r requirements.txt
pip install ./src/transformers

PeRPO dependency

git clone https://github.com/volcengine/verl && cd verl
# Follow the official install docs:
# https://verl.readthedocs.io/en/latest/start/install.html

2. Train

Tip

If you only want to run inference with our pretrained checkpoints, you can skip this training section and jump to 3. Inference.
If you only want to evaluate on our benchmark, you can go directly to 4. Omni-Bench.

Data

Omni-R1 supervision: Zebra-CoT
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
Omni-R1-Zero text-only CoT seeds: M3CoT
https://huggingface.co/datasets/LightChen2333/M3CoT

PeSFT

Minimal DeepSpeed config:

DS_JSON='{
  "bf16": {"enabled": true},
  "zero_optimization": {"stage": 2},
  "train_micro_batch_size_per_gpu": 1,
  "gradient_accumulation_steps": 1
}'

Run

export BASE_OR_CKPT=/path/to/base_or_ckpt
export OUT=checkpoints/pesft_run
export JSON_DIR=data/zebra_cot_jsonl

deepspeed --num_gpus 8 src/PeSFT/pesft.py \
  --model_path "$BASE_OR_CKPT" \
  --output_path "$OUT" \
  --json_dir "$JSON_DIR" \
  --deepspeed_config_json "$DS_JSON" \
  --learning_rate 1e-5 \
  --gradient_accumulation_steps 1 \
  --num_train_epochs 1 \
  --per_device_train_batch_size 1 \
  --mode templated # for Omni-R1-Zero
  # --mode plain # for Omni-R1

What PeSFT does?

Supervised finetuning with cross-entropy + perception alignment loss to stabilize functional image generation.

PeRPO

Note

The end-to-end PeRPO training recipe is being organized and will be released in a more complete form soon.

Tip

PeRPO can be reproduced by following verl’s DAPO recipe. In volcengine/verl, you can directly follow:

verl/recipe/dapo

Then, plug in and reuse our reward functions in src/PeRPO/rewards.py as the reward module for the DAPO training loop.

Reward implementation: src/PeRPO/rewards.py

What PeRPO does?

RL refinement with group-relative optimization using a perception-calibrated reward:
- Accuracy
- Format
- Perception

3. Inference

You can skip training with our pretrained models below:

Checkpoints

Omni-R1: https://huggingface.co/ModalityDance/Omni-R1
Omni-R1-Zero: https://huggingface.co/ModalityDance/Omni-R1-Zero

Run

export INPUT_JSONL=/path/to/data.jsonl
export OUTDIR=outputs/demo_run
export MODEL=/path/to/ckpt
export PROCESSOR=/path/to/processor_ckpt

python src/Inference/inference.py \
  --input "$INPUT_JSONL" \
  --output-dir "$OUTDIR" \
  --model-path "$MODEL" \
  --processor-path "$PROCESSOR" \
  --do-sample \
  --temperature 1.0 \
  --top-p 0.9

Key args meaning

--input: JSONL file (or a directory of JSONL files)
--output-dir: where predictions are saved
--model-path: your checkpoint
--processor-path: processor checkpoint path
--do-sample, --temperature, --top-p: sampling settings

4. Omni-Bench

Data

Download the dataset: https://huggingface.co/datasets/ModalityDance/Omni-Bench

Omni-Bench contains 800 samples spanning 4 Uni-Tasks:

Natural-Scene Perception: V*
Structured-Image: ArxivQA, ChartQA
Diagrammatic Math: Geometry3k, MathVista
Vision-Operational Scenes: ViC-Bench

Evaluation

python omni-bench/vllm_eval.py \
  --parquet_path omni-bench/omni-bench.parquet \
  --model_path /path/to/your_model \
  --outfile preds.jsonl \
  --mm_images_per_prompt 5

What this script does?

Loads Omni-Bench parquet
Runs batched inference with vLLM
Saves predictions in JSONL format (preds.jsonl)

✨ How It Works

Omni-R1 learns to generate interleaved multimodal reasoning trajectories through a two-stage SFT → RL pipeline.

Omni-R1: is trained on annotated interleaved multimodal trajectories.
Omni-R1-Zero: when such annotations are unavailable, bootstraps interleaved trajectories from text-only CoT by visualizing per reasoning step, and then trains with the same pipeline.
PeSFT: performs supervised fine-tuning with cross-entropy plus a perception alignment loss to stabilize the functional image generation.
PeRPO: refines the policy with group-relative RL on unified tasks using a composite reward—Accuracy, Format, and Perception.

A high-level overview is illustrated in the figure below.

🗂️ Project Structure

.
├── omni-bench/
│   ├── omni-bench.parquet         # Benchmark dataset (Available in HF)
│   └── vllm_eval.py               # vLLM inference / evaluation
│
└── src/
    ├── Inference/
    │   └── inference.py            # Inference
    │
    ├── PeRPO/
    │   └── rewards.py              # Perception reward utilities
    │
    ├── PeSFT/
    │   ├── perception.py           # Perception module
    │   ├── perception_module.ckpt  # Perception module checkpoint
    │   ├── pesft.py                # PeSFT training
    │   └── trainer.py              # Training utilities
    │
    └── transformers/

🌱 Acknowledgements

We would like to thank the contributors, open-source projects, and research communities whose work made Omni-R1 possible.

This project is licensed under the MIT License. It also complies with the licenses of referenced third-party projects and dependencies, including the Chameleon Research License. Please refer to the LICENSE file for more details.

📚 Citation

If you use Omni-R1 in your research or applications, please consider citing:

@misc{cheng2026omnir1unifiedgenerativeparadigm,
      title={Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning}, 
      author={Dongjie Cheng and Yongqi Li and Zhixin Ma and Hongru Cai and Yupeng Hu and Wenjie Wang and Liqiang Nie and Wenjie Li},
      year={2026},
      eprint={2601.09536},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.09536}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
omni-bench		omni-bench
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning

🪐 Key Features

🔥 News

Roadmap

📑 Table of Contents

🚀 Quick Start

1. Installation

Create environment (For Inference & PeSFT)

PeRPO dependency

2. Train

Data

PeSFT

PeRPO

3. Inference

Checkpoints

Run

4. Omni-Bench

Data

Evaluation

✨ How It Works

🗂️ Project Structure

🌱 Acknowledgements

📚 Citation

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning

🪐 Key Features

🔥 News

Roadmap

📑 Table of Contents

🚀 Quick Start

1. Installation

Create environment (For Inference & PeSFT)

PeRPO dependency

2. Train

Data

PeSFT

PeRPO

3. Inference

Checkpoints

Run

4. Omni-Bench

Data

Evaluation

✨ How It Works

🗂️ Project Structure

🌱 Acknowledgements

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages