A ROS2 Humble package for real-time 3D object detection and mapping using dual Intel RealSense D435 depth cameras with AI-powered object recognition via GroundingDINO.
- Dual Camera System: Synchronized operation of two Intel RealSense D435 cameras
- 3D Object Detection: Real-time object detection with 3D position estimation
- AI-Powered Recognition: GroundingDINO model for robust object detection
- Stereo Calibration: Precise camera-to-camera calibration using ChArUco boards
- Base Frame Calibration: Camera-to-robot base frame transformation
- Point Cloud Fusion: Merged point cloud from multiple cameras
- Socket Interface: External application integration via TCP socket
- RViz Visualization: Real-time visualization of cameras, objects, and point clouds
- 2x Intel RealSense D435 cameras
- NVIDIA Jetson Orin (or compatible CUDA-capable system)
- Minimum 8GB RAM (16GB recommended)
- USB 3.0 ports for cameras
- Ubuntu 20.04/22.04
- ROS2 Humble
- Python 3.8+
- CUDA 11.4+ (for GPU acceleration)
- PyTorch 1.13+ with CUDA support
# Install ROS2 dependencies
sudo apt update
sudo apt install ros-humble-realsense2-camera ros-humble-realsense2-description
# Install Python packages
pip3 install opencv-contrib-python==4.7.0.72
pip3 install numpy scipy pyyaml
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip3 install transformers pillow matplotlib# Create model directory
sudo mkdir -p /opt/models/groundingdino/
# Download model files
cd /opt/models/groundingdino/
sudo wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
sudo wget https://raw.githubusercontent.com/IDEA-Research/GroundingDINO/main/groundingdino/config/GroundingDINO_SwinT_OGC.py# Clone the repository
cd ~/ros2_ws/src
git clone https://github.com/yourusername/triple_depth2object_position.git
# Build the package
cd ~/ros2_ws
colcon build --packages-select triple_depth2object_position
source install/setup.bashmkdir -p ~/ros2_ws/calibration_dataThe package is configured for the following camera serials by default:
- Left Camera:
419622073822 - Right Camera:
033422072712
To use different cameras, update the serial numbers in the launch files.
# Generate ChArUco calibration board PDF
python3 ~/ros2_ws/src/triple_depth2object_position/scripts/generate_charuco_board_pdf.py
# Print the generated board (charuco_board_4x6.pdf)Calibrate the relative position between the two cameras:
ros2 launch triple_depth2object_position charuco_calibration.launch.py mode:=stereo- Show the ChArUco board to both cameras simultaneously
- Press SPACE to capture frames (collect 50-200 samples)
- Press 'c' to run calibration
- Press 's' to save calibration
- Press ESC to exit
Calibrate cameras to robot base frame:
# Measure board position in base frame (example values in meters)
ros2 launch triple_depth2object_position charuco_calibration.launch.py mode:=base \
board_x:=0.5637134 board_y:=-0.1516032 board_z:=0.1560714# Launch the complete object detection system
ros2 launch triple_depth2object_position object_detection.launch.py
# In another terminal, send detection queries
ros2 topic pub /object_query std_msgs/String "data: '[\"bottle\", \"can\", \"cup\"]'" --once# Launch RViz for visualization
ros2 launch triple_depth2object_position visualization.launch.py| Launch File | Description |
|---|---|
cameras.launch.py |
Launch both RealSense cameras |
charuco_calibration.launch.py |
Camera calibration (stereo or base mode) |
mapping.launch.py |
3D point cloud mapping |
object_detection.launch.py |
Complete object detection system |
visualization.launch.py |
RViz visualization |
/camera_left/camera/color/image_raw- Left RGB image/camera_right/camera/color/image_raw- Right RGB image/camera_left/camera/depth/points- Left point cloud/camera_right/camera/depth/points- Right point cloud/object_query(std_msgs/String) - Objects to detect (JSON array)
/object_detection_result(std_msgs/String) - Detection results (JSON)/merged_pointcloud(sensor_msgs/PointCloud2) - Combined point cloud/camera_frames(visualization_msgs/MarkerArray) - Camera visualizations
For external application integration:
import socket
import json
# Connect to the detection server
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(('localhost', 5000))
# Send detection request
request = {"objects": ["bottle", "can", "cup"]}
client.send(json.dumps(request).encode())
# Receive results
response = client.recv(4096).decode()
result = json.loads(response)
print(f"Detected objects: {result}")
client.close()string label
geometry_msgs/Point position
float32 confidence
string source_camera
DetectedObject3D[] detected_objects
string[] not_found_objects
Edit config/camera_config.yaml to adjust:
- Camera resolution
- Frame rate
- Depth settings
- Point cloud filters
Edit config/object_detection_config.yaml to configure:
- Detection confidence threshold
- Depth filtering parameters
- AI model settings
Edit config/charuco_board.yaml to change board specifications
# Check if cameras are connected
rs-enumerate-devices
# Should show both cameras with their serial numbers# Verify CUDA installation
python3 -c "import torch; print(torch.cuda.is_available())"
# Check GPU memory
nvidia-smi- Ensure good lighting conditions
- Keep board flat and clearly visible
- Collect diverse board poses
- Verify board measurements match config
- Processing time: ~2-3 seconds per query
- Detection range: 0.1m - 5.0m
- Accuracy depends on calibration quality
triple_depth2object_position/
├── dual_camera_system/ # Core Python package
│ ├── detection/ # Object detection modules
│ │ ├── object_detection_3d.py
│ │ └── grounding_dino_wrapper.py
│ ├── mapping/ # Point cloud processing
│ │ └── pointcloud_merger.py
│ ├── visualization/ # RViz visualizations
│ │ ├── marker_visualizer.py
│ │ └── camera_frame_publisher.py
│ ├── charuco_stereo_calibration.py
│ └── charuco_base_calibration.py
├── launch/ # ROS2 launch files
├── config/ # Configuration files
├── msg/ # Custom message definitions
├── scripts/ # Utility scripts
├── package.xml # ROS2 package manifest
├── setup.py # Python package setup
└── CMakeLists.txt # Build configuration
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
Apache License 2.0 - See LICENSE file for details
- Intel RealSense SDK
- GroundingDINO by IDEA Research
- ROS2 Community
For issues and questions:
- Create an issue on GitHub
- Check existing documentation in the repository
If you use this package in your research, please cite:
@software{triple_depth2object,
title = {Triple Depth2Object Position: ROS2 Package for 3D Object Detection},
year = {2024},
url = {https://github.com/yourusername/triple_depth2object_position}
}