cheatthegod · cheatthegod · Nov 21, 2025
diff --git a/examples/deepseek_ocr_setup.md b/examples/deepseek_ocr_setup.md
@@ -0,0 +1,87 @@
+# DeepSeek-OCR GRPO End-to-End Runbook
+
+This guide walks through setting up EasyR1 from a clean machine and launching GRPO training for DeepSeek-OCR.
+
+## 1. Prepare the Environment
+
+### Option A: Recommended Docker Image
+1. Pull the pre-built EasyR1 image (bundles CUDA, flash-attn, transformers, vLLM):
+   ```bash
+docker pull hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
+   ```
+2. Start a container with GPU and shared IPC:
+   ```bash
+docker run -it --ipc=host --gpus=all hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
+   ```
+3. (Inside the container) clone and install EasyR1:
+   ```bash
+git clone https://github.com/hiyouga/EasyR1.git
+cd EasyR1
+pip install -e .
+   ```
+
+**One-shot command block (Docker):**
+```bash
+docker pull hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0 && \
+docker run -it --ipc=host --gpus=all --name easyr1_dsocr hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0 /bin/bash -lc "\
+  git clone https://github.com/hiyouga/EasyR1.git && \
+  cd EasyR1 && \
+  pip install -e . && \
+  bash examples/deepseek_ocr_grpo.sh\
+"
+```
+
+### Option B: Native Installation
+1. Ensure Python 3.9+ is available.
+2. Install dependencies (transformers>=4.54.0, flash-attn>=2.4.3, vllm>=0.8.3):
+   ```bash
+git clone https://github.com/hiyouga/EasyR1.git
+cd EasyR1
+pip install -e .
+   ```
+
+**One-shot command block (Native, assumes CUDA/cuDNN already available):**
+```bash
+python3 -m venv ~/.venvs/easyr1 && \
+source ~/.venvs/easyr1/bin/activate && \
+pip install --upgrade pip && \
+git clone https://github.com/hiyouga/EasyR1.git && \
+cd EasyR1 && \
+pip install -e . && \
+bash examples/deepseek_ocr_grpo.sh
+```
+
+> Tip: If Hugging Face access is slow, set `export HF_ENDPOINT=https://hf-mirror.com` before downloading models.
+
+## 2. Download or Mount the Model
+- The default training script pulls `deepseek-ai/deepseek-ocr`. Replace `MODEL_PATH` with a local checkpoint if needed:
+  ```bash
+MODEL_PATH=/path/to/your/deepseek-ocr
+  ```
+
+## 3. Launch GRPO Training
+Run the provided script (edit `MODEL_PATH` inside if you want a different checkpoint):
+```bash
+bash examples/deepseek_ocr_grpo.sh
+```
+
+> Troubleshooting: If you see `bash: examples/deepseek_ocr_grpo.sh: No such file or directory`, make sure you are inside the cloned repository root (it should contain the `examples/` folder). Run `pwd` to confirm you are in `EasyR1/`, or re-clone via the commands above and re-run from that directory.
+
+To override parameters on the fly (e.g., different GPUs or experiment name):
+```bash
+MODEL_PATH=/path/to/your/deepseek-ocr \
+python3 -m verl.trainer.main \
+  config=examples/deepseek_ocr_grpo.yaml \
+  worker.actor.model.model_path=${MODEL_PATH} \
+  trainer.n_gpus_per_node=4 \
+  trainer.experiment_name=deepseek_ocr_custom_run
+```
+
+Key settings live in `examples/deepseek_ocr_grpo.yaml` (dataset, prompt template, GRPO/KL knobs, actor rollout, offload, and checkpointing). Adjust them as needed for your hardware or dataset variants.
+
+## 4. (Optional) Resume or Merge Checkpoints
+- Enable automatic resume by keeping `trainer.find_last_checkpoint=true` (default).
+- Merge an actor checkpoint to Hugging Face format after training:
+  ```bash
+python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/<exp_name>/<global_step>/actor
+  ```
diff --git a/verl/models/transformers/deepseek_ocr.py b/verl/models/transformers/deepseek_ocr.py
@@ -163,9 +163,18 @@ def deepseek_ocr_base_forward(
 ):
     position_ids = kwargs.get("position_ids")
     if isinstance(position_ids, torch.Tensor):
-        if position_ids.ndim != 3 or position_ids.size(0) not in (3, 4):
+        if position_ids.ndim == 3 and position_ids.size(0) not in (3, 4) and position_ids.size(1) in (3, 4):
+            # Accept batch-first position ids from the dataloader and transpose to
+            # the (3|4, batch_size, seq_length) shape expected by the model.
+            position_ids = position_ids.transpose(0, 1).contiguous()
+            kwargs["position_ids"] = position_ids
+        elif position_ids.ndim != 3 or position_ids.size(0) not in (3, 4):
             raise ValueError("position_ids should be a 3D tensor of shape (3|4, batch_size, seq_length).")
 
+        if position_ids.device != input_ids.device:
+            position_ids = position_ids.to(input_ids.device)
+            kwargs["position_ids"] = position_ids
+
     input_kwargs = _get_input_embeds(self, input_ids, attention_mask, pixel_values, image_grid_thw)
     kwargs.update(input_kwargs)
     outputs = self.language_model(input_ids=None, **kwargs)