Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions examples/deepseek_ocr_setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# DeepSeek-OCR GRPO End-to-End Runbook

This guide walks through setting up EasyR1 from a clean machine and launching GRPO training for DeepSeek-OCR.

## 1. Prepare the Environment

### Option A: Recommended Docker Image
1. Pull the pre-built EasyR1 image (bundles CUDA, flash-attn, transformers, vLLM):
```bash
docker pull hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
```
2. Start a container with GPU and shared IPC:
```bash
docker run -it --ipc=host --gpus=all hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
```
3. (Inside the container) clone and install EasyR1:
```bash
git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .
```

**One-shot command block (Docker):**
```bash
docker pull hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0 && \
docker run -it --ipc=host --gpus=all --name easyr1_dsocr hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0 /bin/bash -lc "\
git clone https://github.com/hiyouga/EasyR1.git && \
cd EasyR1 && \
pip install -e . && \
bash examples/deepseek_ocr_grpo.sh\
"
```

### Option B: Native Installation
1. Ensure Python 3.9+ is available.
2. Install dependencies (transformers>=4.54.0, flash-attn>=2.4.3, vllm>=0.8.3):
```bash
git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .
```

**One-shot command block (Native, assumes CUDA/cuDNN already available):**
```bash
python3 -m venv ~/.venvs/easyr1 && \
source ~/.venvs/easyr1/bin/activate && \
pip install --upgrade pip && \
git clone https://github.com/hiyouga/EasyR1.git && \
cd EasyR1 && \
pip install -e . && \
bash examples/deepseek_ocr_grpo.sh
```

> Tip: If Hugging Face access is slow, set `export HF_ENDPOINT=https://hf-mirror.com` before downloading models.

## 2. Download or Mount the Model
- The default training script pulls `deepseek-ai/deepseek-ocr`. Replace `MODEL_PATH` with a local checkpoint if needed:
```bash
MODEL_PATH=/path/to/your/deepseek-ocr
```

## 3. Launch GRPO Training
Run the provided script (edit `MODEL_PATH` inside if you want a different checkpoint):
```bash
bash examples/deepseek_ocr_grpo.sh
```

> Troubleshooting: If you see `bash: examples/deepseek_ocr_grpo.sh: No such file or directory`, make sure you are inside the cloned repository root (it should contain the `examples/` folder). Run `pwd` to confirm you are in `EasyR1/`, or re-clone via the commands above and re-run from that directory.

To override parameters on the fly (e.g., different GPUs or experiment name):
```bash
MODEL_PATH=/path/to/your/deepseek-ocr \
python3 -m verl.trainer.main \
config=examples/deepseek_ocr_grpo.yaml \
worker.actor.model.model_path=${MODEL_PATH} \
trainer.n_gpus_per_node=4 \
trainer.experiment_name=deepseek_ocr_custom_run
```

Key settings live in `examples/deepseek_ocr_grpo.yaml` (dataset, prompt template, GRPO/KL knobs, actor rollout, offload, and checkpointing). Adjust them as needed for your hardware or dataset variants.

## 4. (Optional) Resume or Merge Checkpoints
- Enable automatic resume by keeping `trainer.find_last_checkpoint=true` (default).
- Merge an actor checkpoint to Hugging Face format after training:
```bash
python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/<exp_name>/<global_step>/actor
```
11 changes: 10 additions & 1 deletion verl/models/transformers/deepseek_ocr.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,9 +163,18 @@ def deepseek_ocr_base_forward(
):
position_ids = kwargs.get("position_ids")
if isinstance(position_ids, torch.Tensor):
if position_ids.ndim != 3 or position_ids.size(0) not in (3, 4):
if position_ids.ndim == 3 and position_ids.size(0) not in (3, 4) and position_ids.size(1) in (3, 4):
# Accept batch-first position ids from the dataloader and transpose to
# the (3|4, batch_size, seq_length) shape expected by the model.
position_ids = position_ids.transpose(0, 1).contiguous()
kwargs["position_ids"] = position_ids
elif position_ids.ndim != 3 or position_ids.size(0) not in (3, 4):
raise ValueError("position_ids should be a 3D tensor of shape (3|4, batch_size, seq_length).")

if position_ids.device != input_ids.device:
position_ids = position_ids.to(input_ids.device)
kwargs["position_ids"] = position_ids

input_kwargs = _get_input_embeds(self, input_ids, attention_mask, pixel_values, image_grid_thw)
kwargs.update(input_kwargs)
outputs = self.language_model(input_ids=None, **kwargs)
Expand Down