PMLDL

Papers

Person-MinkUNet: 3D Person Detection with LiDAR Point Cloud
1. Person-MinkUNet
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
1. Voxel Net tensorflow
2. Voxel net Pytorch
3D Fully Convolutional Network for Vehicle Detection in Point Cloud
1. KiTTI Data Processing and 3D CNN for Vehicle Detection TF
Online Learning for Human Classification in 3D LiDAR-based Tracking
1. hdl_people_tracking is a ROS package for real-time people tracking using a 3D LIDAR
2. Online learning for human classification in 3D LiDAR-based tracking: This is a ROS-based online learning framework for human classification in 3D LiDAR scans, taking advantage of robust multi-target tracking to avoid the need for data annotation by a human expert. Please watch the videos below for more details.
3. L-CAS 3D Point Cloud People Dataset
PointPillars: Fast Encoders for Object Detection from Point Clouds
1. Implementation of PointPillars in PyTorch for KITTI 3D Object Detetcion
2. Point Pillars (3D Object Detection) Through Explanation with Code
3. PointPillars: This repo demonstrates how to reproduce the results from PointPillars: Fast Encoders for Object Detection from Point Clouds (to be published at CVPR 2019) on the KITTI dataset by making the minimum required changes from the preexisting open source codebase SECOND.
4. MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection.
Online learning for 3D LiDAR-based human detection: experimental analysis of point cloud clustering and classification methods
LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention
1. Core Idea: This paper introduces an end-to-end 3D video object detector for LiDAR point clouds. The framework has two main components:
  1. Pillar Message Passing Network (PMPNet): Encodes spatial features within each frame by constructing a graph, where each LiDAR point is treated as a node, allowing information sharing across neighboring points.
  2. Attentive Spatiotemporal Transformer GRU (AST-GRU): Aggregates temporal information across frames using Transformer attention mechanisms to align and emphasize foreground objects, improving detection across sequences.
2. Dataset: Uses the nuScenes dataset, a large-scale dataset with continuous LiDAR point clouds for autonomous driving.
3. Input/Output: The input is sequential LiDAR point cloud frames, and the output is 3D bounding boxes with object classifications.
4. Relevance: Highly relevant, especially for tasks involving tracking. The AST-GRU’s spatiotemporal feature aggregation can help maintain consistent human tracking across frames.
Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving
1. Core Idea: This paper proposes the Temporal-Channel Transformer (TCTR), designed to process sequential LiDAR frames by encoding temporal information (frame-to-frame) and channel-wise information. The TCTR model has two main parts:
  1. Temporal-Channel Encoder: Encodes temporal-channel information across frames, capturing dependencies between frame sequences and voxel channels.
  2. Spatial Decoder: Decodes spatial relationships within each frame to produce a dense representation for the current frame, refined to highlight objects.
2. Dataset: Also uses nuScenes, ideal for evaluating video-based LiDAR detection performance.
3. Input/Output: Input is consecutive frames converted into 2D pseudo-images (voxels), and output is 3D object bounding boxes.
4. Relevance: Relevant for human detection and tracking in sparse data; however, TCTR is more focused on enhancing single-frame representations with past frames, which is less dynamic than PMPNet's graph-based message passing.
SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection:
1. Core Idea: This paper introduces SA-Det3D, a self-attention module for 3D object detection, which integrates Full Self-Attention (FSA) and Deformable Self-Attention (DSA) to enhance the context-awareness of object detection models in point clouds. It aims to improve detection by allowing the network to capture long-range dependencies in the 3D space without increasing computational costs significantly.
  1. FSA computes interactions between all points for robust global context.
  2. DSA reduces computational demand by selectively attending to key points.
2. Dataset: It utilizes several large-scale 3D datasets, including KITTI, nuScenes, and Waymo Open Dataset.
3. Input/Output:
  1. Input: LiDAR point clouds, processed into pillars or voxels, with each frame as input.
  2. Output: 3D bounding boxes and classification labels for detected objects within each frame.
4. Relevance to Your Project:
  1. This paper is relevant if your primary focus is detection, as the self-attention mechanism enhances the model’s ability to distinguish between humans and other objects by learning context across larger distances within a single frame.
  2. However, SA-Det3D lacks temporal tracking, which is essential for tracking and recognition across multiple frames. The self-attention improves frame-by-frame detection but does not utilize temporal information, limiting its effectiveness for continuous human tracking.
Point Density-Aware Voxels for LiDAR 3D Object Detection:
1. Core Idea: This paper presents Point Density-Aware Voxel Network (PDV), designed to address LiDAR's uneven point density at varying distances. PDV leverages density-aware Region of Interest (RoI) pooling and uses kernel density estimation (KDE) and self-attention to incorporate point density information directly into the detection process. PDV enhances voxel feature localization by calculating voxel centroids for greater spatial accuracy and aggregates point features to better capture object shapes.
  1. Voxel Point Centroid Localization: Calculates centroids of points in each voxel for more accurate localization.
  2. Density-Aware RoI Pooling: Uses KDE to capture local density around each grid point, adding density-based positional encoding in the self-attention module.
2. Dataset: Evaluated on large-scale autonomous driving datasets, including the Waymo Open Dataset and KITTI.
3. Input/Output:
  1. Input: LiDAR point cloud data with point coordinates and features (e.g., intensity).
  2. Output: 3D bounding boxes and object classifications, refined based on density awareness.
4. Relevance to Your Project:
  1. Detection Accuracy: PDV's density-aware design improves detection accuracy in varying densities, which can be useful for human detection, especially in crowded scenes or at greater distances where point density may be lower.
  2. Tracking Limitations: This model focuses on refining object detection using density-based features in single frames. Unlike multi-frame methods, it doesn’t integrate temporal information, which limits its capability for continuous human tracking.

Datasets

To-Do dataset

Explore the dataset to understand its structure (e.g., point cloud format, annotations, metadata).
Visualize sample point cloud scans
- 3D Detection & Tracking Viewer: 3D detection and tracking viewer (visualization) for kitti & waymo dataset
[ ]

key words

dat.: dataset | cls.: classification | rel.: retrieval | seg.: segmentation
det.: detection | tra.: tracking | pos.: pose | dep.: depth
reg.: registration | rec.: reconstruction | aut.: autonomous driving
oth.: other, including normal-related, correspondence, mapping, matching, alignment, compression, generative model...

popular datsets

[ModelNet] The Princeton ModelNet . [cls.]
[ShapeNet] A collaborative dataset between researchers at Princeton, Stanford and TTIC. [seg.]
[S3DIS] The Stanford Large-Scale 3D Indoor Spaces Dataset. [seg.]
[ScanNet] Richly-annotated 3D Reconstructions of Indoor Scenes. [cls. seg.]
[SUNRGB-D] 19 object categories for predicting a 3D bounding box in real world dimension. [det.]
[Large-Scale Point Cloud Classification Benchmark(ETH)] This benchmark closes the gap and provides a large labelled 3D point cloud data set of natural scenes with over 4 billion points in total. [cls.]
[Paris-Lille-3D] A large and high-quality ground truth urban point cloud dataset for automatic segmentation and classification. [cls. seg.]
[KITTI] The KITTI Vision Benchmark Suite. [det.]
Waymo Open Dataset
ONCE 3D Object Detection Baselines
NuScenes 3D Object Detection
L-CAS 3D Point Cloud People Dataset
Sydney Urban Objects Dataset
IQmulus & TerraMobilita Contest

other datasets

[PartNet] The PartNet dataset provides fine grained part annotation of objects in ShapeNetCore. [seg.]
[PartNet] PartNet benchmark from Nanjing University and National University of Defense Technology. [seg.]
[Stanford 3D] The Stanford 3D Scanning Repository. [reg.]
[UWA Dataset] . [cls. seg. reg.]
[Princeton Shape Benchmark] The Princeton Shape Benchmark.
[SYDNEY URBAN OBJECTS DATASET] This dataset contains a variety of common urban road objects scanned with a Velodyne HDL-64E LIDAR, collected in the CBD of Sydney, Australia. There are 631 individual scans of objects across classes of vehicles, pedestrians, signs and trees. [cls. match.]
[ASL Datasets Repository(ETH)] This site is dedicated to provide datasets for the Robotics community with the aim to facilitate result evaluations and comparisons. [cls. match. reg. det]
[Robotic 3D Scan Repository] The Canadian Planetary Emulation Terrain 3D Mapping Dataset is a collection of three-dimensional laser scans gathered at two unique planetary analogue rover test facilities in Canada.
[Radish] The Robotics Data Set Repository (Radish for short) provides a collection of standard robotics data sets.
[IQmulus & TerraMobilita Contest] The database contains 3D MLS data from a dense urban environment in Paris (France), composed of 300 million points. The acquisition was made in January 2013. [cls. seg. det.]
[Oakland 3-D Point Cloud Dataset] This repository contains labeled 3-D point cloud laser data collected from a moving platform in a urban environment.
[Robotic 3D Scan Repository] This repository provides 3D point clouds from robotic experiments，log files of robot runs and standard 3D data sets for the robotics community.
[Ford Campus Vision and Lidar Data Set] The dataset is collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck.
[The Stanford Track Collection] This dataset contains about 14,000 labeled tracks of objects as observed in natural street scenes by a Velodyne HDL-64E S2 LIDAR.
[PASCAL3D+] Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild. [pos. det.]
[3D MNIST] The aim of this dataset is to provide a simple way to get started with 3D computer vision problems such as 3D shape recognition. [cls.]
[WAD] This dataset is provided by Baidu Inc.
[nuScenes] The nuScenes dataset is a large-scale autonomous driving dataset.
[PreSIL] Depth information, semantic segmentation (images), point-wise segmentation (point clouds), ground point labels (point clouds), and detailed annotations for all vehicles and people. [paper] [det. aut.]
[3D Match] Keypoint Matching Benchmark, Geometric Registration Benchmark, RGB-D Reconstruction Datasets. [reg. rec. oth.]
[BLVD] (a) 3D detection, (b) 4D tracking, (c) 5D interactive event recognition and (d) 5D intention prediction. [ICRA 2019 paper] [det. tra. aut. oth.]
[PedX] 3D Pose Estimation of Pedestrians, more than 5,000 pairs of high-resolution (12MP) stereo images and LiDAR data along with providing 2D and 3D labels of pedestrians. [ICRA 2019 paper] [pos. aut.]
[H3D] Full-surround 3D multi-object detection and tracking dataset. [ICRA 2019 paper] [det. tra. aut.]
[Argoverse BY ARGO AI] Two public datasets (3D Tracking and Motion Forecasting) supported by highly detailed maps to test, experiment, and teach self-driving vehicles how to understand the world around them.[CVPR 2019 paper][tra. aut.]
[Matterport3D] RGB-D: 10,800 panoramic views from 194,400 RGB-D images. Annotations: surface reconstructions, camera poses, and 2D and 3D semantic segmentations. Keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and scene classification. [3DV 2017 paper] [code] [blog]
[SynthCity] SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Nine categories. [seg. aut.]
[Lyft Level 5] Include high quality, human-labelled 3D bounding boxes of traffic agents, an underlying HD spatial semantic map. [det. seg. aut.]
[SemanticKITTI] Sequential Semantic Segmentation, 28 classes, for autonomous driving. All sequences of KITTI odometry labeled. [ICCV 2019 paper][seg. oth. aut.]
[The Waymo Open Dataset] The Waymo Open Dataset is comprised of high resolution sensor data collected by Waymo self-driving cars in a wide variety of conditions.[det.]
[A*3D: An Autonomous Driving Dataset in Challeging Environments] A*3D: An Autonomous Driving Dataset in Challeging Environments.[det.]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
OpenPCDet		OpenPCDet
PointPillars		PointPillars
README.md		README.md
visualize_KITTI.ipynb		visualize_KITTI.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PMLDL

Papers

Datasets

To-Do dataset

key words

popular datsets

other datasets

references

Models

Model To-Do List

To-Dos

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PMLDL

Papers

Datasets

To-Do dataset

key words

popular datsets

other datasets

references

Models

Model To-Do List

To-Dos

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages