Skip to content

Conversation

@chenchaoxu7575
Copy link
Contributor

@chenchaoxu7575 chenchaoxu7575 commented Dec 17, 2025

What does this PR do?

Isaac Server Mode for Multi-Task LIBERO

This PR introduces a decoupled Isaac Lab simulation architecture that separates model inference (Gen) from physics simulation (Env), enabling efficient multi-task reinforcement learning with pipeline parallelism. The simulation is managed via Ray actors, supporting multi-node deployment.

Key Features:

  1. IsaacServer - Ray actor that wraps Isaac Lab environment, runs on sim GPUs with multi-task support
  2. IsaacServerManager - Manages multiple IsaacServers across stages and GPUs, handles task-to-server routing
  3. EnvWorkerServer - Lightweight coordinator that routes actions to correct servers via traj_key mapping
  4. TaskBalancedSampler - Ensures balanced task distribution in batches, with per-stage interleaving (tightly coupled with EnvWorkerServer stage assignment)
  5. Pipeline-Parallel Rollout - Each stage has isolated servers, enabling GPU time-multiplexing between simulation and generation
  6. Multi-Node Sim Support - Sim nodes can be distributed across multiple machines via Ray cluster
image image

Test

Validated on LIBERO-10 benchmark with 10 manipulation tasks across 3 scenes (living room, kitchen, study).

Test Configuration:

  • 2 SIM Nodes with 10 GPUs total for Isaac Simulation
  • 1 TRAIN Node with 8 GPUs for model inference
  • 2 Pipeline Stages (matching server groups)
  • 10 tasks × 16 envs/task = 160 envs per stage
  • 256×256 camera resolution

Results:

  • Successfully completed multi-task rollouts across all 10 LIBERO tasks
  • Pipeline overlap achieved between Gen and Sim
  • No environment state corruption between stages (verified via video recordings)
  • Ray timeline:
image

API and Usage Example

1. Start Ray Cluster (Multi-Node Setup):

# On head node (TRAIN NODE)
ray start --head --port=6379

# On SIM nodes (join cluster)
ray start --address=<head_node_ip>:6379 --resources='{"sim": 1}'

2. Run Training with Ray-managed Isaac Servers:

cd verl/recipe/vla

# Key configuration parameters
export NUM_TASKS=10               # LIBERO-10 tasks
export GROUP_SIZE=16              # Envs per task per stage
export STAGE_NUM=2                # Pipeline stages
export NUM_ISAAC_SERVERS=10       # Total sim GPUs
export SIM_NODES=2                # Number of sim nodes

# Launch training (servers are created automatically by Ray)
./run_simpleVLA_isaac_disagg_server.sh

3. Configuration in YAML:

env:
  train:
    isaac_server_mode: True       # Enable Ray actor mode
    num_isaac_servers: 10         # Servers per stage
    num_tasks: 10
    group_size: 16                # Envs per task
    total_trajs: 128              # Total trajectories for training
  rollout:
    pipeline_stage_num: 2
  disagg_sim:
    enable: True
    nnodes: 2                     # Number of sim nodes

4. Using TaskBalancedSampler in code:

from recipe.vla.workers.env import create_task_balanced_sampler

sampler = create_task_balanced_sampler(
    dataset=train_dataset,
    batch_size=32,
    max_per_task=16,      # <= GROUP_SIZE
    stage_num=2,          # Match pipeline stages
    seed=42,
)
# Note: Sampler's interleaving is tightly coupled with 
# EnvWorkerServer.reset_envs_to_state_ids() stage assignment

Design & Code Changes

Architecture Diagram:

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Ray Cluster                                        │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │                    TRAIN NODE (Gen - Model Inference)                  │  │
│  │  ┌──────────────┐         ┌─────────────────────────────┐              │  │
│  │  │  VeRL        │         │      EnvWorkerServer        │              │  │
│  │  │  Framework   │────────▶│   (Lightweight Coordinator) │              │  │
│  │  │  + EnvLoop   │         │   traj_key → env mapping    │              │  │
│  │  └──────────────┘         └──────────────┬──────────────┘              │  │
│  └───────────────────────────────────────────┼────────────────────────────┘  │
│                                              │                               │
│                    ┌─────────────────────────┼─────────────────────────┐     │
│                    │  IsaacServerManager     │                         │     │
│                    │  - Task → Server routing                          │     │
│                    │  - Batched step/reset                             │     │
│                    └─────────────────────────┼─────────────────────────┘     │
│                                              │ Ray Actor Calls               │
│  ════════════════════════════════════════════╪═══════════════════════════    │
│  ┌───────────────────────────────────────────┼───────────────────────────┐   │
│  │                 SIM NODES (Multi-Node Support)                        │   │
│  │                                           │                           │   │
│  │   Stage 0 Servers          Stage 1 Servers                            │   │
│  │   ┌─────────────────┐      ┌─────────────────┐                        │   │
│  │   │ IsaacServer 0   │      │ IsaacServer 0   │  (GPU 0, time-shared)  │   │
│  │   │ Tasks: 0-1      │      │ Tasks: 0-1      │                        │   │
│  │   ├─────────────────┤      ├─────────────────┤                        │   │
│  │   │ IsaacServer 1   │      │ IsaacServer 1   │  (GPU 1, time-shared)  │   │
│  │   │ Tasks: 2-3      │      │ Tasks: 2-3      │                        │   │
│  │   ├─────────────────┤      ├─────────────────┤                        │   │
│  │   │ ...             │      │ ...             │                        │   │
│  │   └─────────────────┘      └─────────────────┘                        │   │
│  └───────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

Design & Code Changes

File Changes:

File Description
recipe/vla/isaac_server/isaac_server.py IsaacServer - Ray actor wrapping Isaac Lab environment
recipe/vla/isaac_server/isaac_server_manager.py IsaacServerManager - manages servers across stages and GPUs
recipe/vla/workers/env/env_worker_server.py EnvWorkerServer - lightweight coordinator with traj_key routing
recipe/vla/workers/env/utils.py TaskBalancedSampler for per-stage task balancing
recipe/vla/env_loop.py EnvLoop with stage-aware traj_key passing
recipe/vla/config/rob_ppo_trainer.yaml Configuration for Isaac server mode
recipe/vla/run_simpleVLA_isaac_disagg_server.sh Launch script for Ray-based Isaac server mode

Key Design Decisions:

  1. Ray Actor Architecture: Isaac servers are Ray actors, enabling unified resource management across train and sim nodes. No manual server startup needed.

  2. Multi-Node Sim Support: Sim nodes join Ray cluster with custom resource label (sim), allowing IsaacServers to be scheduled to appropriate nodes.

  3. Traj-Env 1:1 Mapping: Each trajectory maps to exactly one sim env via traj_key, enabling flexible env deployment without group constraints.

  4. Stage Isolation: Each pipeline stage has its own set of servers, physically isolated. Stages time-share GPUs (e.g., 2 stages → 0.5 GPU/server).

  5. Coupled Stage Assignment: traj_idx % stage_num logic in reset_envs_to_state_ids() MUST match TaskBalancedSampler's interleaving.

Data Flow Example

batch_size=64, stage_num=2, num_tasks=10, group_size=16, servers=10

TaskBalancedSampler produces: [t0, t1, t2, t3, ..., t63] (interleaved by stage)
                               ↓    ↓   ↓   ↓
Stage assignment:              S0   S1  S0  S1  ...

Stage 0 gets: trajs [0,2,4,...,62] = 32 trajs → 32 sim envs
Stage 1 gets: trajs [1,3,5,...,63] = 32 trajs → 32 sim envs

Each traj gets a traj_key:
  traj_key="a1b2c3d4" → {env_index=5, task_id=3, stage_id=0, server_rank=0}

Checklist Before Submitting

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature, the Isaac Server Mode, which decouples simulation from the training loop. The architecture is well-designed, using ZMQ for communication and a multi-group server setup to support pipeline parallelism. The code is generally well-structured, with clear separation of concerns between the client, server, and env loop logic. However, I've found a few critical issues related to handling edge cases, specifically when a batch contains zero environments or trajectories. These cases can lead to unhandled exceptions and crash the application. Addressing these will make the implementation much more robust.

@CLAassistant
Copy link

CLAassistant commented Dec 17, 2025

CLA assistant check
All committers have signed the CLA.

@chenchaoxu7575 chenchaoxu7575 force-pushed the vla-intg-isaac-server-multitask branch from ea1e8d3 to 4e86cb7 Compare December 22, 2025 07:16
@chenchaoxu7575 chenchaoxu7575 changed the title [WIP][recipe, VLA] feat: support isaac server mode for multitask libero config [recipe, VLA] feat: support isaac server mode for multitask libero config Dec 23, 2025
@chenchaoxu7575 chenchaoxu7575 marked this pull request as ready for review December 23, 2025 10:06
@chenchaoxu7575
Copy link
Contributor Author

@chenhaiq @HanlinDu
Please review this PR.

- Add IsaacServer and IsaacServerManager for Ray-based Isaac Lab simulation
- Add EnvWorkerServer as lightweight adapter for Isaac server mode
- Add TaskBalancedSampler to ensure per-task env capacity is not exceeded
- Update env_loop to support pipeline stages with traj-env mapping
- Add run script for isaac server mode (run_simpleVLA_isaac_disagg_server.sh)
@chenchaoxu7575 chenchaoxu7575 force-pushed the vla-intg-isaac-server-multitask branch from 6205f58 to aa06b72 Compare January 8, 2026 10:33
Upstream changed compute_log_prob return type from tuple to dict in verl-project#4678.
Update RobDataParallelPPOActor.compute_log_prob to match the new interface.
- Remove redundant camera_height/camera_width from config (use init_params)
- Update shell script to use init_params.camera_heights/widths
- Rename USE_RAY_ACTORS to ISAAC_SERVER_MODE for consistency
- Update env_worker_server to read camera config from init_params
- Add isaac_server_mode check in env_loop for 1:1 traj-env mapping
- Restore code order in rob_ray_trainer
@chenchaoxu7575
Copy link
Contributor Author

Isaac Servers are managed by Ray now.
For those interested in TCP/ZMQ Isaac Server implementation(before migrating to Ray Actor), please refer to the backup branch up to commit d455059: https://github.com/chenchaoxu7575/verl/commits/vla-intg-isaac-server-multitask-backup/

@chenhaiq chenhaiq self-requested a review January 19, 2026 07:38
Copy link
Collaborator

@chenhaiq chenhaiq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some unit tests? for example: 1. integration for EnvLoop+Isaac

reset_results = ray.get(reset_future[0])

# Debug: print reset_results structure
print(f"[DEBUG reset_results] type: {type(reset_results)}", flush=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use logger.debug

logger.info(f"[Stage {self.stage_id} Actor {self.actor_rank}] Initializing Isaac environment: {self.env_id}")

# Detect GPU
num_gpus = torch.cuda.device_count()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssertionError: file /home/runner/work/verl/verl/verl/experimental/vla/isaac_server/isaac_server.py contains .cuda/"cuda"/"nccl" usage, please use api in verl/utils/device.py directly.


# Use print to ensure visibility in Ray logs
cleared_msg = " (cleared)" if clear_cache else ""
print(f"[Stage {stage_id} Rank {server_rank}] Cache directories configured{cleared_msg}:", flush=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use logger instead of print

video_base_dir: /tmp/videos
num_envs: 16
seed: 42
task_suite_name: libero_10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add into verl/experimental/vla/readme.md about how to config isaas server mode

if is_last_chunk:
self.env.unwrapped.cfg.sim.render_interval = original_render_interval
else:
self.env.unwrapped.cfg.sim.render_interval = 999999
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comment about why it is 999999


# When stage_num > 1, each stage gets batch_size/stage_num samples
# and each stage has its own max_per_task constraint
self.samples_per_stage = batch_size // stage_num
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please assert batch_size % stage_num == 0


SAVE_VIDEO=False

export PYTHONRECURSIONLIMIT=10000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need PYTHONRECURSIONLIMIT?

trajectory_chunks = data_proto.chunk(self.total_trajs)
else:
# Local mode: each trajectory has num_envs_per_worker envs
num_trajectories = self.total_trajs // self.num_envs_per_worker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add an assertion here to check for divisibility. Currently, the TensorDict.chunk() method used by DataProto does not perform such a check and simply uses integer division.

logger.info(f"[Stage {self.stage_id} Actor {self.actor_rank}] Visible GPUs: {num_gpus}, using {self.device}")

# Import Isaac Lab components - follow IsaacEnv pattern exactly
import gymnasium as gym
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please place these imports together at the beginning of the file, unless they are conditional imports.

- Add module-level comments explaining Isaac Sim import order requirements
- Import torch inside methods after AppLauncher initialization (cached, no overhead)
- Replace print() with logger calls in isaac_server and env_worker_server
- Add divisibility checks for isaac_server_mode in env_loop and utils
- Fix camera parameter retrieval from init_params in env_worker_server
- Update PYTHONRECURSIONLIMIT comment with TODO for verification
- Simplify render_interval logic using decimation parameter
- Correct variable naming from 'chunk' to 'action' in _handle_chunk_step
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants