This project aims to develop a comprehensive framework for 3D scene reconstruction, enabling effective autonomous robot navigation in dynamic environments. It integrates advanced object detection, depth mapping, and path planning techniques to enhance robot perception and decision-making capabilities.
This project constructs a detailed 3D representation of environments using video data, enabling robots to classify and localize obstacles accurately. The system integrates depth mapping, object detection, and optimized path planning for safe and efficient navigation in complex settings.
We used the ScanNet sensor dataset, specifically scene0000, containing:
- 5,578 frames of RGB images
- Depth maps
- Camera pose information
The project is structured around the following objectives:
- Depth Estimation: Using MiDAS for accurate depth mapping from RGB images.
- 3D Scene Reconstruction: Integrating RGB-D data and camera poses.
- Object Detection: Leveraging YOLOv8 Nano for real-time object detection.
- Instance Segmentation: Using Mobile SAM for segmenting and tracking individual objects.
- 3D Object Mapping: Projecting objects into the 3D scene for spatial context.
- Bird’s-Eye View Generation: Simplifying 3D data into a 2D representation.
- Optimal Path Planning: Computing obstacle-free paths in the 3D environment.
- Model Used: MiDAS
- Outcome: Predicted depth maps compared against true depth values, demonstrating the accuracy of the approach.
- Used true depth images for higher accuracy.
- Integrated RGB-D data and camera poses into a point cloud and mesh representation.
- Model Used: YOLOv8 Nano
- Classes Detected: Common objects like chairs, tables, sofas, etc.
- Techniques: Confidence thresholding and Non-Maximum Suppression.
- Model Used: Mobile SAM
- Generated binary masks aligned with object shapes for use in 3D mapping.
- Projected segmented objects into the 3D scene with unique visual indicators.
- Created 2D occupancy grids from 3D point clouds for simplified spatial visualization.
- Pathfinding algorithm to compute the optimal path between points of interest.