-
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
-
3D Fully Convolutional Network for Vehicle Detection in Point Cloud
-
Online Learning for Human Classification in 3D LiDAR-based Tracking
- hdl_people_tracking is a ROS package for real-time people tracking using a 3D LIDAR
- Online learning for human classification in 3D LiDAR-based tracking: This is a ROS-based online learning framework for human classification in 3D LiDAR scans, taking advantage of robust multi-target tracking to avoid the need for data annotation by a human expert. Please watch the videos below for more details.
- L-CAS 3D Point Cloud People Dataset
-
PointPillars: Fast Encoders for Object Detection from Point Clouds
- Implementation of PointPillars in PyTorch for KITTI 3D Object Detetcion
- Point Pillars (3D Object Detection) Through Explanation with Code
- PointPillars: This repo demonstrates how to reproduce the results from PointPillars: Fast Encoders for Object Detection from Point Clouds (to be published at CVPR 2019) on the KITTI dataset by making the minimum required changes from the preexisting open source codebase SECOND.
- MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection.
-
- Core Idea: This paper introduces an end-to-end 3D video object detector for LiDAR point clouds. The framework has two main components:
- Pillar Message Passing Network (PMPNet): Encodes spatial features within each frame by constructing a graph, where each LiDAR point is treated as a node, allowing information sharing across neighboring points.
- Attentive Spatiotemporal Transformer GRU (AST-GRU): Aggregates temporal information across frames using Transformer attention mechanisms to align and emphasize foreground objects, improving detection across sequences.
- Dataset: Uses the nuScenes dataset, a large-scale dataset with continuous LiDAR point clouds for autonomous driving.
- Input/Output: The input is sequential LiDAR point cloud frames, and the output is 3D bounding boxes with object classifications.
- Relevance: Highly relevant, especially for tasks involving tracking. The AST-GRU’s spatiotemporal feature aggregation can help maintain consistent human tracking across frames.
- Core Idea: This paper introduces an end-to-end 3D video object detector for LiDAR point clouds. The framework has two main components:
-
Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving
- Core Idea: This paper proposes the Temporal-Channel Transformer (TCTR), designed to process sequential LiDAR frames by encoding temporal information (frame-to-frame) and channel-wise information. The TCTR model has two main parts:
- Temporal-Channel Encoder: Encodes temporal-channel information across frames, capturing dependencies between frame sequences and voxel channels.
- Spatial Decoder: Decodes spatial relationships within each frame to produce a dense representation for the current frame, refined to highlight objects.
- Dataset: Also uses nuScenes, ideal for evaluating video-based LiDAR detection performance.
- Input/Output: Input is consecutive frames converted into 2D pseudo-images (voxels), and output is 3D object bounding boxes.
- Relevance: Relevant for human detection and tracking in sparse data; however, TCTR is more focused on enhancing single-frame representations with past frames, which is less dynamic than PMPNet's graph-based message passing.
- Core Idea: This paper proposes the Temporal-Channel Transformer (TCTR), designed to process sequential LiDAR frames by encoding temporal information (frame-to-frame) and channel-wise information. The TCTR model has two main parts:
-
SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection:
- Core Idea: This paper introduces SA-Det3D, a self-attention module for 3D object detection, which integrates Full Self-Attention (FSA) and Deformable Self-Attention (DSA) to enhance the context-awareness of object detection models in point clouds. It aims to improve detection by allowing the network to capture long-range dependencies in the 3D space without increasing computational costs significantly.
- FSA computes interactions between all points for robust global context.
- DSA reduces computational demand by selectively attending to key points.
- Dataset: It utilizes several large-scale 3D datasets, including KITTI, nuScenes, and Waymo Open Dataset.
- Input/Output:
- Input: LiDAR point clouds, processed into pillars or voxels, with each frame as input.
- Output: 3D bounding boxes and classification labels for detected objects within each frame.
- Relevance to Your Project:
- This paper is relevant if your primary focus is detection, as the self-attention mechanism enhances the model’s ability to distinguish between humans and other objects by learning context across larger distances within a single frame.
- However, SA-Det3D lacks temporal tracking, which is essential for tracking and recognition across multiple frames. The self-attention improves frame-by-frame detection but does not utilize temporal information, limiting its effectiveness for continuous human tracking.
- Core Idea: This paper introduces SA-Det3D, a self-attention module for 3D object detection, which integrates Full Self-Attention (FSA) and Deformable Self-Attention (DSA) to enhance the context-awareness of object detection models in point clouds. It aims to improve detection by allowing the network to capture long-range dependencies in the 3D space without increasing computational costs significantly.
-
Point Density-Aware Voxels for LiDAR 3D Object Detection:
- Core Idea: This paper presents Point Density-Aware Voxel Network (PDV), designed to address LiDAR's uneven point density at varying distances. PDV leverages density-aware Region of Interest (RoI) pooling and uses kernel density estimation (KDE) and self-attention to incorporate point density information directly into the detection process. PDV enhances voxel feature localization by calculating voxel centroids for greater spatial accuracy and aggregates point features to better capture object shapes.
- Voxel Point Centroid Localization: Calculates centroids of points in each voxel for more accurate localization.
- Density-Aware RoI Pooling: Uses KDE to capture local density around each grid point, adding density-based positional encoding in the self-attention module.
- Dataset: Evaluated on large-scale autonomous driving datasets, including the Waymo Open Dataset and KITTI.
- Input/Output:
- Input: LiDAR point cloud data with point coordinates and features (e.g., intensity).
- Output: 3D bounding boxes and object classifications, refined based on density awareness.
- Relevance to Your Project:
- Detection Accuracy: PDV's density-aware design improves detection accuracy in varying densities, which can be useful for human detection, especially in crowded scenes or at greater distances where point density may be lower.
- Tracking Limitations: This model focuses on refining object detection using density-based features in single frames. Unlike multi-frame methods, it doesn’t integrate temporal information, which limits its capability for continuous human tracking.
- Core Idea: This paper presents Point Density-Aware Voxel Network (PDV), designed to address LiDAR's uneven point density at varying distances. PDV leverages density-aware Region of Interest (RoI) pooling and uses kernel density estimation (KDE) and self-attention to incorporate point density information directly into the detection process. PDV enhances voxel feature localization by calculating voxel centroids for greater spatial accuracy and aggregates point features to better capture object shapes.
- Explore the dataset to understand its structure (e.g., point cloud format, annotations, metadata).
- Visualize sample point cloud scans
- 3D Detection & Tracking Viewer: 3D detection and tracking viewer (visualization) for kitti & waymo dataset
- [ ]
dat.: dataset | cls.: classification | rel.: retrieval | seg.: segmentation
det.: detection | tra.: tracking | pos.: pose | dep.: depth
reg.: registration | rec.: reconstruction | aut.: autonomous driving
oth.: other, including normal-related, correspondence, mapping, matching, alignment, compression, generative model...
- [ModelNet] The Princeton ModelNet . [
cls.] - [ShapeNet] A collaborative dataset between researchers at Princeton, Stanford and TTIC. [
seg.] - [S3DIS] The Stanford Large-Scale 3D Indoor Spaces Dataset. [
seg.] - [ScanNet] Richly-annotated 3D Reconstructions of Indoor Scenes. [
cls.seg.] - [SUNRGB-D] 19 object categories for predicting a 3D bounding box in real world dimension. [
det.] - [Large-Scale Point Cloud Classification Benchmark(ETH)] This benchmark closes the gap and provides a large labelled 3D point cloud data set of natural scenes with over 4 billion points in total. [
cls.] - [Paris-Lille-3D] A large and high-quality ground truth urban point cloud dataset for automatic segmentation and classification. [
cls.seg.] - [KITTI] The KITTI Vision Benchmark Suite. [
det.] - Waymo Open Dataset
- ONCE 3D Object Detection Baselines
- NuScenes 3D Object Detection
- L-CAS 3D Point Cloud People Dataset
- Sydney Urban Objects Dataset
- IQmulus & TerraMobilita Contest
-
[PartNet] The PartNet dataset provides fine grained part annotation of objects in ShapeNetCore. [
seg.] -
[PartNet] PartNet benchmark from Nanjing University and National University of Defense Technology. [
seg.] -
[Stanford 3D] The Stanford 3D Scanning Repository. [
reg.] -
[UWA Dataset] . [
cls.seg.reg.] -
[Princeton Shape Benchmark] The Princeton Shape Benchmark.
-
[SYDNEY URBAN OBJECTS DATASET] This dataset contains a variety of common urban road objects scanned with a Velodyne HDL-64E LIDAR, collected in the CBD of Sydney, Australia. There are 631 individual scans of objects across classes of vehicles, pedestrians, signs and trees. [
cls.match.] -
[ASL Datasets Repository(ETH)] This site is dedicated to provide datasets for the Robotics community with the aim to facilitate result evaluations and comparisons. [
cls.match.reg.det] -
[Robotic 3D Scan Repository] The Canadian Planetary Emulation Terrain 3D Mapping Dataset is a collection of three-dimensional laser scans gathered at two unique planetary analogue rover test facilities in Canada.
-
[Radish] The Robotics Data Set Repository (Radish for short) provides a collection of standard robotics data sets.
-
[IQmulus & TerraMobilita Contest] The database contains 3D MLS data from a dense urban environment in Paris (France), composed of 300 million points. The acquisition was made in January 2013. [
cls.seg.det.] -
[Oakland 3-D Point Cloud Dataset] This repository contains labeled 3-D point cloud laser data collected from a moving platform in a urban environment.
-
[Robotic 3D Scan Repository] This repository provides 3D point clouds from robotic experiments,log files of robot runs and standard 3D data sets for the robotics community.
-
[Ford Campus Vision and Lidar Data Set] The dataset is collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck.
-
[The Stanford Track Collection] This dataset contains about 14,000 labeled tracks of objects as observed in natural street scenes by a Velodyne HDL-64E S2 LIDAR.
-
[PASCAL3D+] Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild. [
pos.det.] -
[3D MNIST] The aim of this dataset is to provide a simple way to get started with 3D computer vision problems such as 3D shape recognition. [
cls.] -
[WAD] This dataset is provided by Baidu Inc.
-
[nuScenes] The nuScenes dataset is a large-scale autonomous driving dataset.
-
[PreSIL] Depth information, semantic segmentation (images), point-wise segmentation (point clouds), ground point labels (point clouds), and detailed annotations for all vehicles and people. [paper] [
det.aut.] -
[3D Match] Keypoint Matching Benchmark, Geometric Registration Benchmark, RGB-D Reconstruction Datasets. [
reg.rec.oth.] -
[BLVD] (a) 3D detection, (b) 4D tracking, (c) 5D interactive event recognition and (d) 5D intention prediction. [ICRA 2019 paper] [
det.tra.aut.oth.] -
[PedX] 3D Pose Estimation of Pedestrians, more than 5,000 pairs of high-resolution (12MP) stereo images and LiDAR data along with providing 2D and 3D labels of pedestrians. [ICRA 2019 paper] [
pos.aut.] -
[H3D] Full-surround 3D multi-object detection and tracking dataset. [ICRA 2019 paper] [
det.tra.aut.] -
[Argoverse BY ARGO AI] Two public datasets (3D Tracking and Motion Forecasting) supported by highly detailed maps to test, experiment, and teach self-driving vehicles how to understand the world around them.[CVPR 2019 paper][
tra.aut.] -
[Matterport3D] RGB-D: 10,800 panoramic views from 194,400 RGB-D images. Annotations: surface reconstructions, camera poses, and 2D and 3D semantic segmentations. Keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and scene classification. [3DV 2017 paper] [code] [blog]
-
[SynthCity] SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Nine categories. [
seg.aut.] -
[Lyft Level 5] Include high quality, human-labelled 3D bounding boxes of traffic agents, an underlying HD spatial semantic map. [
det.seg.aut.] -
[SemanticKITTI] Sequential Semantic Segmentation, 28 classes, for autonomous driving. All sequences of KITTI odometry labeled. [ICCV 2019 paper][
seg.oth.aut.] -
[The Waymo Open Dataset] The Waymo Open Dataset is comprised of high resolution sensor data collected by Waymo self-driving cars in a wide variety of conditions.[
det.] -
[A*3D: An Autonomous Driving Dataset in Challeging Environments] A*3D: An Autonomous Driving Dataset in Challeging Environments.[
det.]
- Yochengliu/awesome-point-cloud-analysis: A list of papers and datasets about point cloud analysis (processing)
- timzhang642/3D-Machine-Learning: A resource repository for 3D machine learning
- Dataset from Yulan Guo's Homepage
- Research state-of-the-art models for 3D point cloud detection, recognition, and tracking (e.g., PointPillars, PV-RCNN, CenterPoint).
- Person-MinkUNet: 3D Person Detection with LiDAR Point Cloud
- PV-CNN
- OpenPCDet
- PointRCNN
- Part-A2-Net
- PV-RCNN
- Voxel R-CNN
- PV-RCNN++
- MPPNet
- Contains implementations of models which used 3D Object Detection on nuScenes
- EA-LSS: Edge-aware Lift-splat-shot Framework for 3D BEV Object Detection
- kaggle competition: Lyft 3D Object Detection for Autonomous Vehicles: contains codes of participants
- kaggle competition KITTI-3D-Object-Detection-Dataset: contains codes of participants
- and others
- Compare models based on their strengths, weaknesses, and suitability for the project.
- Select the primary models to implement (e.g., PointPillars for real-time performance, PV-RCNN for accuracy ...).
- Document the rationale behind model selection for future reference.
- Document what kind of data format required for model training
- implement models of find code from
gh,papers with code,kaggle - Evaluate model performance on the validation set using relevant metrics (e.g., precision, recall, F1 score, IoU).
- Compare results from different models to determine the best-performing architecture.
- Identify and analyze failure cases (e.g., missed detections, misclassifications).
- Log and track model performance metrics for each model tested.
- Implement multi-object tracking using models like AB3DMOT or integrate tracking into the detection pipeline.
Person-MinkUNet