-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[recipe, VLA] feat: support isaac server mode for multitask libero config #4578
base: main
Are you sure you want to change the base?
[recipe, VLA] feat: support isaac server mode for multitask libero config #4578
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant new feature, the Isaac Server Mode, which decouples simulation from the training loop. The architecture is well-designed, using ZMQ for communication and a multi-group server setup to support pipeline parallelism. The code is generally well-structured, with clear separation of concerns between the client, server, and env loop logic. However, I've found a few critical issues related to handling edge cases, specifically when a batch contains zero environments or trajectories. These cases can lead to unhandled exceptions and crash the application. Addressing these will make the implementation much more robust.
ea1e8d3 to
4e86cb7
Compare
- Add IsaacServer and IsaacServerManager for Ray-based Isaac Lab simulation - Add EnvWorkerServer as lightweight adapter for Isaac server mode - Add TaskBalancedSampler to ensure per-task env capacity is not exceeded - Update env_loop to support pipeline stages with traj-env mapping - Add run script for isaac server mode (run_simpleVLA_isaac_disagg_server.sh)
6205f58 to
aa06b72
Compare
Upstream changed compute_log_prob return type from tuple to dict in verl-project#4678. Update RobDataParallelPPOActor.compute_log_prob to match the new interface.
- Remove redundant camera_height/camera_width from config (use init_params) - Update shell script to use init_params.camera_heights/widths - Rename USE_RAY_ACTORS to ISAAC_SERVER_MODE for consistency - Update env_worker_server to read camera config from init_params - Add isaac_server_mode check in env_loop for 1:1 traj-env mapping - Restore code order in rob_ray_trainer
|
Isaac Servers are managed by Ray now. |
chenhaiq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some unit tests? for example: 1. integration for EnvLoop+Isaac
recipe/vla/env_loop_server.py
Outdated
| reset_results = ray.get(reset_future[0]) | ||
|
|
||
| # Debug: print reset_results structure | ||
| print(f"[DEBUG reset_results] type: {type(reset_results)}", flush=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use logger.debug
| logger.info(f"[Stage {self.stage_id} Actor {self.actor_rank}] Initializing Isaac environment: {self.env_id}") | ||
|
|
||
| # Detect GPU | ||
| num_gpus = torch.cuda.device_count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssertionError: file /home/runner/work/verl/verl/verl/experimental/vla/isaac_server/isaac_server.py contains .cuda/"cuda"/"nccl" usage, please use api in verl/utils/device.py directly.
|
|
||
| # Use print to ensure visibility in Ray logs | ||
| cleared_msg = " (cleared)" if clear_cache else "" | ||
| print(f"[Stage {stage_id} Rank {server_rank}] Cache directories configured{cleared_msg}:", flush=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use logger instead of print
| video_base_dir: /tmp/videos | ||
| num_envs: 16 | ||
| seed: 42 | ||
| task_suite_name: libero_10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add into verl/experimental/vla/readme.md about how to config isaas server mode
| if is_last_chunk: | ||
| self.env.unwrapped.cfg.sim.render_interval = original_render_interval | ||
| else: | ||
| self.env.unwrapped.cfg.sim.render_interval = 999999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add comment about why it is 999999
|
|
||
| # When stage_num > 1, each stage gets batch_size/stage_num samples | ||
| # and each stage has its own max_per_task constraint | ||
| self.samples_per_stage = batch_size // stage_num |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please assert batch_size % stage_num == 0
|
|
||
| SAVE_VIDEO=False | ||
|
|
||
| export PYTHONRECURSIONLIMIT=10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need PYTHONRECURSIONLIMIT?
| trajectory_chunks = data_proto.chunk(self.total_trajs) | ||
| else: | ||
| # Local mode: each trajectory has num_envs_per_worker envs | ||
| num_trajectories = self.total_trajs // self.num_envs_per_worker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add an assertion here to check for divisibility. Currently, the TensorDict.chunk() method used by DataProto does not perform such a check and simply uses integer division.
| logger.info(f"[Stage {self.stage_id} Actor {self.actor_rank}] Visible GPUs: {num_gpus}, using {self.device}") | ||
|
|
||
| # Import Isaac Lab components - follow IsaacEnv pattern exactly | ||
| import gymnasium as gym |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please place these imports together at the beginning of the file, unless they are conditional imports.
- Add module-level comments explaining Isaac Sim import order requirements - Import torch inside methods after AppLauncher initialization (cached, no overhead) - Replace print() with logger calls in isaac_server and env_worker_server - Add divisibility checks for isaac_server_mode in env_loop and utils - Fix camera parameter retrieval from init_params in env_worker_server - Update PYTHONRECURSIONLIMIT comment with TODO for verification - Simplify render_interval logic using decimation parameter - Correct variable naming from 'chunk' to 'action' in _handle_chunk_step
What does this PR do?
Isaac Server Mode for Multi-Task LIBERO
This PR introduces a decoupled Isaac Lab simulation architecture that separates model inference (Gen) from physics simulation (Env), enabling efficient multi-task reinforcement learning with pipeline parallelism. The simulation is managed via Ray actors, supporting multi-node deployment.
Key Features:
Test
Validated on LIBERO-10 benchmark with 10 manipulation tasks across 3 scenes (living room, kitchen, study).
Test Configuration:
Results:
API and Usage Example
1. Start Ray Cluster (Multi-Node Setup):
2. Run Training with Ray-managed Isaac Servers:
3. Configuration in YAML:
4. Using TaskBalancedSampler in code:
Design & Code Changes
Architecture Diagram:
Design & Code Changes
File Changes:
recipe/vla/isaac_server/isaac_server.pyrecipe/vla/isaac_server/isaac_server_manager.pyrecipe/vla/workers/env/env_worker_server.pyrecipe/vla/workers/env/utils.pyrecipe/vla/env_loop.pyrecipe/vla/config/rob_ppo_trainer.yamlrecipe/vla/run_simpleVLA_isaac_disagg_server.shKey Design Decisions:
Ray Actor Architecture: Isaac servers are Ray actors, enabling unified resource management across train and sim nodes. No manual server startup needed.
Multi-Node Sim Support: Sim nodes join Ray cluster with custom resource label (
sim), allowing IsaacServers to be scheduled to appropriate nodes.Traj-Env 1:1 Mapping: Each trajectory maps to exactly one sim env via
traj_key, enabling flexible env deployment without group constraints.Stage Isolation: Each pipeline stage has its own set of servers, physically isolated. Stages time-share GPUs (e.g., 2 stages → 0.5 GPU/server).
Coupled Stage Assignment:
traj_idx % stage_numlogic inreset_envs_to_state_ids()MUST matchTaskBalancedSampler's interleaving.Data Flow Example
Checklist Before Submitting
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)