Skip to content

ModalityDance/Omni-R1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning


Welcome to Omni-R1! 👋 This repository provides implementation code for "Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning".

We instantiate this paradigm with Omni-R1, a two-stage SFT+RL framework featuring perception alignment loss and perception reward, thereby enabling functional image generation. Additionally, we introduce Omni-R1-Zero, which eliminates the need for multimodal annotations by bootstrapping step-wise visualizations from text-only reasoning data.

🪐 Key Features

Important

Faster Evaluation & RL Rollouts with vLLM. Our evaluation and RL rollout pipelines (based on verl) are accelerated by vLLM, which can significantly reduce the inference time of large-scale sampling and long rollouts.

🧭 Two-stage training pipeline
PeSFT introduces perception alignment loss during SFT, and PeRPO applies a perception reward during RL to enhance functional image generation.

🌌 Two training regimes under different supervision
Omni-R1 uses multimodal interleaved supervision, while Omni-R1-Zero bootstraps step-wise visualizations from text-only reasoning data.

🧩 Benchmark
Includes Omni-Bench data and a vLLM-based evaluation script that runs inference efficiently and saves predictions in JSONL format.

🔥 News

  • [2026.01] Initial release of Omni-R1.

Roadmap

✅ Reproducibility essentials for Omni-R1 (core code, datasets, checkpoints)
✅ Paper link
✅ Omni-Bench (data + vLLM evaluation script)
⬜ Fully end-to-end PeRPO training framework
⬜ The implementation of bootstrapping step-wise visualizations

📑 Table of Contents

🚀 Quick Start

1. Installation

Create environment (For Inference & PeSFT)

python -m venv .venv
source .venv/bin/activate

pip install -U pip
pip install -r requirements.txt
pip install ./src/transformers

PeRPO dependency

git clone https://github.com/volcengine/verl && cd verl
# Follow the official install docs:
# https://verl.readthedocs.io/en/latest/start/install.html

2. Train

Tip

If you only want to run inference with our pretrained checkpoints, you can skip this training section and jump to 3. Inference.
If you only want to evaluate on our benchmark, you can go directly to 4. Omni-Bench.

Data

PeSFT

Minimal DeepSpeed config:

DS_JSON='{
  "bf16": {"enabled": true},
  "zero_optimization": {"stage": 2},
  "train_micro_batch_size_per_gpu": 1,
  "gradient_accumulation_steps": 1
}'

Run

export BASE_OR_CKPT=/path/to/base_or_ckpt
export OUT=checkpoints/pesft_run
export JSON_DIR=data/zebra_cot_jsonl

deepspeed --num_gpus 8 src/PeSFT/pesft.py \
  --model_path "$BASE_OR_CKPT" \
  --output_path "$OUT" \
  --json_dir "$JSON_DIR" \
  --deepspeed_config_json "$DS_JSON" \
  --learning_rate 1e-5 \
  --gradient_accumulation_steps 1 \
  --num_train_epochs 1 \
  --per_device_train_batch_size 1 \
  --mode templated # for Omni-R1-Zero
  # --mode plain # for Omni-R1
What PeSFT does?
  • Supervised finetuning with cross-entropy + perception alignment loss to stabilize functional image generation.

PeRPO

Note

The end-to-end PeRPO training recipe is being organized and will be released in a more complete form soon.

Tip

PeRPO can be reproduced by following verl’s DAPO recipe. In volcengine/verl, you can directly follow:

  • verl/recipe/dapo

Then, plug in and reuse our reward functions in src/PeRPO/rewards.py as the reward module for the DAPO training loop.

Reward implementation: src/PeRPO/rewards.py

What PeRPO does?
  • RL refinement with group-relative optimization using a perception-calibrated reward:
    • Accuracy
    • Format
    • Perception

3. Inference

You can skip training with our pretrained models below:

Checkpoints

Run

export INPUT_JSONL=/path/to/data.jsonl
export OUTDIR=outputs/demo_run
export MODEL=/path/to/ckpt
export PROCESSOR=/path/to/processor_ckpt

python src/Inference/inference.py \
  --input "$INPUT_JSONL" \
  --output-dir "$OUTDIR" \
  --model-path "$MODEL" \
  --processor-path "$PROCESSOR" \
  --do-sample \
  --temperature 1.0 \
  --top-p 0.9
Key args meaning
  • --input: JSONL file (or a directory of JSONL files)
  • --output-dir: where predictions are saved
  • --model-path: your checkpoint
  • --processor-path: processor checkpoint path
  • --do-sample, --temperature, --top-p: sampling settings

4. Omni-Bench

Data

Download the dataset: https://huggingface.co/datasets/ModalityDance/Omni-Bench

Omni-Bench contains 800 samples spanning 4 Uni-Tasks:

  • Natural-Scene Perception: V*
  • Structured-Image: ArxivQA, ChartQA
  • Diagrammatic Math: Geometry3k, MathVista
  • Vision-Operational Scenes: ViC-Bench

Evaluation

python omni-bench/vllm_eval.py \
  --parquet_path omni-bench/omni-bench.parquet \
  --model_path /path/to/your_model \
  --outfile preds.jsonl \
  --mm_images_per_prompt 5
What this script does?
  • Loads Omni-Bench parquet
  • Runs batched inference with vLLM
  • Saves predictions in JSONL format (preds.jsonl)

✨ How It Works

Omni-R1 learns to generate interleaved multimodal reasoning trajectories through a two-stage SFT → RL pipeline.

  • Omni-R1: is trained on annotated interleaved multimodal trajectories.
  • Omni-R1-Zero: when such annotations are unavailable, bootstraps interleaved trajectories from text-only CoT by visualizing per reasoning step, and then trains with the same pipeline.
  • PeSFT: performs supervised fine-tuning with cross-entropy plus a perception alignment loss to stabilize the functional image generation.
  • PeRPO: refines the policy with group-relative RL on unified tasks using a composite reward—Accuracy, Format, and Perception.

A high-level overview is illustrated in the figure below.

Overview

🗂️ Project Structure

.
├── omni-bench/
│   ├── omni-bench.parquet         # Benchmark dataset (Available in HF)
│   └── vllm_eval.py               # vLLM inference / evaluation
│
└── src/
    ├── Inference/
    │   └── inference.py            # Inference
    │
    ├── PeRPO/
    │   └── rewards.py              # Perception reward utilities
    │
    ├── PeSFT/
    │   ├── perception.py           # Perception module
    │   ├── perception_module.ckpt  # Perception module checkpoint
    │   ├── pesft.py                # PeSFT training
    │   └── trainer.py              # Training utilities
    │
    └── transformers/

🌱 Acknowledgements

We would like to thank the contributors, open-source projects, and research communities whose work made Omni-R1 possible.

Anole Anole_Training Training%20Experience Zebra-CoT M3CoT Fine--tuning verl vllm

This project is licensed under the MIT License. It also complies with the licenses of referenced third-party projects and dependencies, including the Chameleon Research License. Please refer to the LICENSE file for more details.

📚 Citation

If you use Omni-R1 in your research or applications, please consider citing:

@misc{cheng2026omnir1unifiedgenerativeparadigm,
      title={Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning}, 
      author={Dongjie Cheng and Yongqi Li and Zhixin Ma and Hongru Cai and Yupeng Hu and Wenjie Wang and Liqiang Nie and Wenjie Li},
      year={2026},
      eprint={2601.09536},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.09536}, 
}

About

"Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning"

Resources

License

Stars

Watchers

Forks

Contributors