Real-time stereo depth estimation and interactive 3D point cloud visualization using Fast-FoundationStereo, with optional SAM2 object tracking and 6D oriented bounding box estimation.
Built on top of Fast-FoundationStereo (CVPR 2026).
- Real-time stereo depth with FFS zero-shot generalization (~7-10 FPS on RTX 3090)
- Multiple camera support: USB stereo cameras, Intel RealSense D415, OAK-D Lite
- Stereo calibration pipeline: ChArUco-based calibration for custom stereo rigs
- SAM2 interactive tracking: Click or drag to select objects, real-time mask tracking
- 3D point cloud visualization: Interactive Open3D viewer with camera frustum
- 6D bounding box: PCA-based oriented bounding box with temporal smoothing on tracked objects
| Camera | Script | Notes |
|---|---|---|
| USB stereo (side-by-side) | ffsd_demos/stereo_ffs_realtime.py |
Requires stereo calibration |
| Intel RealSense D415 | ffsd_demos/d415_ffs_realtime.py |
Uses IR stereo pair + RGB colorization |
| OAK-D Lite | ffsd_demos/oak_ffs_realtime.py |
Uses DepthAI SDK |
conda create -n ffs python=3.12 && conda activate ffs
pip install torch==2.6.0 torchvision==0.21.0 xformers --index-url https://download.pytorch.org/whl/cu124git clone https://github.com/Vector-Wangel/Fast-FoundationStereoPose.git
cd Fast-FoundationStereoPosepip install -r requirements.txt
pip install open3d pyyaml
# For RealSense:
pip install pyrealsense2
# For OAK-D:
pip install depthaiDownload the entire 23-36-37 folder from the official link and place under weights/:
weights/23-36-37/
├── cfg.yaml
└── model_best_bp2_serialize.pth
The SAM2 source code (including CameraPredictor for real-time streaming) is already included in SAM2_streaming/. You only need to download the checkpoint:
mkdir -p SAM2_streaming/checkpoints/sam2.1
wget -P SAM2_streaming/checkpoints/sam2.1 https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_small.ptFor USB stereo cameras that output side-by-side frames, you need to calibrate first.
python calibration/diag_charuco.pyVerifies that the camera can detect your ChArUco board and shows detection results.
python calibration/collect_cali.py- Point the camera at the ChArUco board from various angles and distances
- Press Space to capture (only saves when board is detected)
- Press q to quit
- Aim for 15-30 image pairs
Images are saved to calibration/calib_imgs/left/ and calibration/calib_imgs/right/.
python calibration/calibrate.pyGenerates calibration/stereo_calib.yaml with intrinsics, distortion, rectification maps, and baseline.
python ffsd_demos/stereo_ffs_realtime.pyDisplays a real-time interactive 3D point cloud in an Open3D window.
python ffsd_demos/d415_ffs_realtime.pyUses IR stereo pairs for depth estimation with RGB colorization.
python ffsd_demos/oak_ffs_realtime.pypython ffsd_demos/sam2_rgb_demo.py- Drag to draw a bounding box around the target
- Click to select a point on the target
- r to reset tracking
- q to quit
python ffsd_demos/combined_sam2_stereo.pyTwo windows:
- Left (OpenCV): RGB image with SAM2 mask overlay and mouse interaction
- Right (Open3D): 3D point cloud with tracked object highlighted in red + green oriented bounding box
Controls (focus on OpenCV window):
- Drag: Select target with bounding box
- Click: Select target with point
- r: Reset tracking
- q: Quit
Key parameters in the demo scripts that you may want to tune:
| Parameter | Default | Description |
|---|---|---|
VALID_ITERS |
6 | FFS refinement iterations (lower = faster, less accurate) |
MAX_DISP |
192 | Maximum disparity (increase for close objects) |
PCD_STRIDE |
2 | Point cloud downsampling (higher = fewer points, faster) |
ZFAR |
5.0 | Maximum depth in meters |
ZNEAR |
0.05 | Minimum depth in meters |
- Camera images are rotated 180 degrees in some scripts (for upside-down mounted cameras). Remove
cv2.rotate(..., cv2.ROTATE_180)if your camera is right-side up. - SAM2's
fill_hole_areais set to 0 to bypass the_C.soCUDA extension, which requires GLIBC >= 2.32. This is a cosmetic-only limitation. - FFS uses the NVIDIA non-commercial license. See
LICENSE.txt.
- Fast-FoundationStereo (NVIDIA, CVPR 2026)
- SAM 2 (Meta, ECCV 2024)