Isaac ROS Image Segmentation (+ SAM3)

Fork of NVIDIA-ISAAC-ROS/isaac_ros_image_segmentation. Adds SAM3 (Meta, 2025) as a ROS 2 node with a PyTorch-native backend.

SAM3 on ROS 2

SAM3 is Meta's open-vocabulary segmentation model released in 2025 that handles grounded segmentation via text prompts. This fork integrates it into the Isaac ROS image segmentation ecosystem as a Python-based ROS 2 node.

Key characteristics:

Open-vocabulary — segment by text prompt ("robot arm", "person", etc.)
Full SAM3 — ViT-H backbone, FPN neck, grounding decoder (not a distilled variant)
PyTorch-native — no ONNX or Triton; runs directly with torch.inference_mode()

Why PyTorch-native?

The natural path for Isaac ROS is to export models to ONNX → TensorRT for optimized inference. That didn't work here.

torch.export (required for torch_tensorrt) fails on the SAM3 decoder due to pin_memory not being supported in export mode. ONNX export hits the same wall. TensorRT works fine for the vision encoder only, which is why the optional TRT vision path exists — but the decoder stays in PyTorch.

Instead, the optimizations applied are:

BF16 autocast (pytorch_amp_bf16, default True) — wraps the full pipeline, ~2× vision encoder speedup with no quality loss
torch.compile decoder (pytorch_compile_decoder, default True) — ~3× decoder speedup via inductor
torch.compile vision encoder (pytorch_compile_vision, optional) — additional speedup if TRT not used

Performance (RTX 4090, r2b_robotarm dataset)

Stage	Time
Vision Encoder (BF16)	~56 ms
Text Encoder (cached)	~0 ms
Decoder (compile + BF16)	~34 ms
Preprocess + postprocess	~30 ms
ROS 2 E2E	~130 ms (~7.5 fps)

Quickstart

Prerequisites

NVIDIA GPU with CUDA 12+
Docker with --runtime=nvidia
HuggingFace account with access to facebook/sam3 (gated repo — request access first)

1. Clone & Download Checkpoint

git clone https://github.com/eclipse0922/isaac_ros_image_segmentation.git
cd isaac_ros_image_segmentation

# Login to HuggingFace and download sam3.pt (~3.3 GB)
pip install huggingface_hub
huggingface-cli login
mkdir -p models/sam3
huggingface-cli download facebook/sam3 sam3.pt --local-dir models/sam3

2. Build Docker Image

docker build \
  -f isaac_ros_segment_anything3/docker/Dockerfile.pytorch \
  -t sam3_pytorch:latest .

Base: NGC PyTorch 26.01 (Ubuntu 24.04, CUDA 13.1, PyTorch 2.10, TensorRT 10.14) + ROS 2 Jazzy.

3. Run Foxglove Demo (with robot arm video)

# On host — streams segmentation overlay to Foxglove Studio on port 8765
./run_foxglove_demo.sh \
  --bag datasets/r2bdataset2024_v1/r2b_robotarm \
  --topic /camera_1/color/image_raw \
  --prompt "robot arm"

Open Foxglove Studio → connect to ws://localhost:8765 → subscribe to /sam3/overlay.

The r2b_robotarm dataset (~1.4 GB MCAP) can be downloaded via NGC:
./download_dataset_ngc.sh

Standalone Test (no ROS 2)

docker run --runtime=nvidia --rm \
  -v $(pwd):/ws -w /ws \
  sam3_pytorch:latest \
  python3 isaac_ros_segment_anything3/scripts/test_sam3_pytorch.py \
    --checkpoint models/sam3/sam3.pt \
    --image /path/to/image.jpg \
    --prompt "cat" \
    --precision bf16

ROS 2 Node

Package: isaac_ros_segment_anything3 Node: sam3_node.py

Key Parameters

Parameter	Default	Description
`model_checkpoint_path`	`/tmp/models/sam3.pt`	Path to `sam3.pt`
`text_prompts`	`""`	Comma-separated prompts (also settable at runtime via service)
`confidence_threshold`	`0.3`	Detection score threshold
`pytorch_amp_bf16`	`True`	BF16 autocast for full pipeline
`pytorch_compile_decoder`	`True`	`torch.compile` on decoder
`pytorch_compile_vision`	`False`	`torch.compile` on vision encoder
`image_size`	`1008`	Input resolution (SAM3 native: 1008×1008)

Runtime Prompt Update

ros2 service call /sam3/set_text_prompt \
  isaac_ros_segment_anything3_interfaces/srv/SetTextPrompt \
  "{text_prompts: ['robot arm', 'person']}"

Topics

Topic	Type	Description
`/image_raw` (sub)	`sensor_msgs/Image`	Input RGB image
`/sam3/raw_segmentation_mask` (pub)	`sensor_msgs/Image`	Binary mask
`/sam3/overlay` (pub)	`sensor_msgs/Image`	Visualization overlay
`/sam3/timing` (pub)	`Sam3Timing`	Per-stage latency

Architecture Notes

SAM3 uses a ViT-H vision backbone with an FPN neck (4 feature scales) and a grounding decoder that takes text embeddings as queries. Unlike SAM1/2, there is no point/box prompt — everything goes through text.

Key implementation details:

Image resolution: 1008×1008 (stretch resize, no letterbox)
Normalization: mean=std=[0.5, 0.5, 0.5]
Text encoding via built-in CLIP tokenizer (model.backbone.forward_text())
Decoder outputs 200 query slots with normalized [cx, cy, w, h] boxes
Scoring: sigmoid(pred_logits) × sigmoid(presence_logits) > threshold
Geometry encoder CLS token requires cross-attention with image features before the main decoder

Future Work

C++ NITROS zero-copy: The vision encoder is already exported as a TRT engine (.pt2). A C++ ComposableNode wrapping it with NITROS image transport would enable zero-copy GPU input — the natural Isaac ROS path. The decoder currently cannot be TRT-exported (pin_memory not supported in torch.export), so it would remain Python, but the image-to-features path could be fully GPU-resident.
Decoder TRT export: If the upstream SAM3 decoder removes the pin_memory dependency, full TRT deployment via torch_tensorrt becomes straightforward.
Dynamic batch / multi-camera: The current node is single-image, single-camera. Batching across multiple camera streams would improve GPU utilization.

Original NVIDIA Packages

The packages below are from the original NVIDIA Isaac ROS release and remain unchanged in this fork. Refer to the upstream documentation for usage.

Package	Model	Description
`isaac_ros_unet`	U-Net	Semantic segmentation via TensorRT
`isaac_ros_segformer`	Segformer	Transformer-based segmentation
`isaac_ros_segment_anything`	SAM	Prompt-based segmentation (SAM1)
`isaac_ros_segment_anything2`	SAM2	Video object segmentation

License

Apache 2.0 — see LICENSE. SAM3 model weights are subject to Meta's model license.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
isaac_ros_gxf_extensions		isaac_ros_gxf_extensions
isaac_ros_peoplesemseg_models_install		isaac_ros_peoplesemseg_models_install
isaac_ros_segformer		isaac_ros_segformer
isaac_ros_segment_anything		isaac_ros_segment_anything
isaac_ros_segment_anything2		isaac_ros_segment_anything2
isaac_ros_segment_anything2_interfaces		isaac_ros_segment_anything2_interfaces
isaac_ros_segment_anything3		isaac_ros_segment_anything3
isaac_ros_segment_anything3_benchmark		isaac_ros_segment_anything3_benchmark
isaac_ros_segment_anything3_interfaces		isaac_ros_segment_anything3_interfaces
isaac_ros_unet		isaac_ros_unet
resources		resources
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANUAL_TEST_GUIDE.md		MANUAL_TEST_GUIDE.md
QUICKSTART_BENCHMARK.md		QUICKSTART_BENCHMARK.md
README.md		README.md
SECURITY.md		SECURITY.md
download_dataset_ngc.sh		download_dataset_ngc.sh
run_foxglove_demo.sh		run_foxglove_demo.sh
test_sam3_benchmark.sh		test_sam3_benchmark.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Isaac ROS Image Segmentation (+ SAM3)

SAM3 on ROS 2

Why PyTorch-native?

Performance (RTX 4090, r2b_robotarm dataset)

Quickstart

Prerequisites

1. Clone & Download Checkpoint

2. Build Docker Image

3. Run Foxglove Demo (with robot arm video)

Standalone Test (no ROS 2)

ROS 2 Node

Key Parameters

Runtime Prompt Update

Topics

Architecture Notes

Future Work

Original NVIDIA Packages

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Isaac ROS Image Segmentation (+ SAM3)

SAM3 on ROS 2

Why PyTorch-native?

Performance (RTX 4090, r2b_robotarm dataset)

Quickstart

Prerequisites

1. Clone & Download Checkpoint

2. Build Docker Image

3. Run Foxglove Demo (with robot arm video)

Standalone Test (no ROS 2)

ROS 2 Node

Key Parameters

Runtime Prompt Update

Topics

Architecture Notes

Future Work

Original NVIDIA Packages

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages