Semantic mapping for Autonomous Vehicles

Semantic mapping for Autonomous Vehicles

An example of an Octomap created with the additional semantic information. The recording was sped up 10x to show the complete mapping process.

The Docker container

To build the docker image (requires around 12.6 GB of space), clone the repository, and once you're inside the directory, use the following command:

docker build -t semantic-mapping .

You can start the container by running:

bash start_container.sh

Note

You can adjust the paths to the directory with rosbags and with segmentation model inside the bash script.

Warning

By default the docker container is removed after being closed and all data is lost. To avoid that you can remove --rm flag, save the data inside shared folders or copy the data from the container to your computer.

ROS2 Node for mapping with semantic information

An additional semantic information can be valuable in multiple robotic tasks, particularly in navigation and path planning. For example, it enables intelligent path cost assignment based on terrain type (allowing robots to favor asphalt over slippery or muddy surfaces for safer, faster travel), and distinguishing between static and dynamic obstacles, a knowledge that is useful in planning avoidance strategies.

To start the ROS2 node that runs segmentation on images from a camera and adjusts color of a pointcloud from LIDAR/depth camera based on the segmentation mask, you can run:

ros2 launch segmentation_node segmentation_node_launch.py

Note

You can check all available adjustable parameters (topic names, frame ids, not using Octomap, changing number of threads, etc.) by running ros2 launch segmentation_node segmentation_node_launch.py --show-args.

The ROS2 node implements a synchronized processing pipeline for semantic mapping. It begins by subscribing to RGB camera images and point cloud data from topics specified as parameters. These images go through preprocessing, including resolution adjustment and normalization using ImageNet weights, before being passed to an ONNX-format segmentation model for inference. Using transformation between the camera and LiDAR frames (obtained from the /tf topic), the each 3D point is projected onto the segmentation mask and assigns appropriate semantic colors based on the predicted labels. The resulting point cloud is then published to another topic, where the OctoMap server can subscribe to it and construct a semantic 3D map of the environment.

The launch file starts the segmentation node, Octomap server and RVIZ for visualition. By default Octomap is not visible in RVIZ due to long time required to update the visualization (the map itself runs fine), but you can enable it at any time. Using default settings, the time required to process a pointcloud with corresponding rgb image is on average around 50 ms (20 fps) on computer with AMD Ryzen 5 5600X, 32GB of RAM and NVidia RTX 4070. Running single-threaded slowed down processing to around 10 fps, while running on all 12 threads led to 25-30 fps.

Class filtering

The segmentation node allows to remove points belonging to selected classes. Users can configure which classes are visible by adjusting the classes_with_colors parameter in JSON format, where the boolean value (true or false) determines whether points from that class are included in the final point cloud visualization. This can be useful to create a general map of the environment, where all movable objects are removed (e.g. cars/people from a road or parking lot). The other usage can be creation of a 3D model of a specific object that is a target of a robot (e.g. a cup on a desk).

Segmentation models

Class Definitions

ID	Category	RGB
0	Other	(0, 0, 0)
1	Sky	(70, 130, 180)
2	Building	(70, 70, 70)
3	Grass	(152, 251, 152)
4	Road	(128, 64, 128)
5	Tree	(107, 142, 35)
6	Traffic objects	(220, 220, 0)
7	Car	(0, 0, 142)
8	Person	(255, 0, 0)

Segmentation models visualisations

DeepLabV3+	UNet	LinkNet

(DeepLabV3+ citation)	(UNet citation)	(LinkNet citation)

SegFormer	FPN

(SegFormer citation)	(FPN citation)

Inference performance

Inference performance across semantic segmentation models (over 1000 iterations). Latency values are reported in milliseconds (ms), memory in megabytes (MB), and peak GPU utilization in percentage (%). Bold formatting represents the best results for full-resolution models, and Underlined values represent the best results for low-resolution models.

Model	Encoder	Resolution	Mean Latency (ms)	Std Latency (ms)	Min Latency (ms)	Max Latency (ms)	GPU Mem (MB)	GPU Util (%)
DeepLabv3+	EfficientNet-B2	960 × 608	11.823	0.103	11.687	13.599	72	55
	MobileNetV4	960 × 608	5.646	0.269	5.355	13.505	24	53
	ResNet-34	960 × 608	10.999	0.405	10.817	19.826	138	60
	EfficientNet-B2	480 × 304	4.381	0.114	4.043	4.919	72	39
FPN	EfficientNet-B2	960 × 608	11.114	0.327	10.934	20.954	72	55
	MobileNetV4	960 × 608	6.974	0.265	6.671	14.702	24	56
	ResNet-34	960 × 608	10.922	0.116	10.751	13.620	138	55
SegFormer	EfficientNet-B2	960 × 608	12.937	0.319	12.720	22.477	72	58
	MobileNetV4	960 × 608	8.589	0.288	8.407	17.343	24	47
	ResNet-34	960 × 608	12.881	0.089	12.717	13.559	138	59
U-Net	EfficientNet-B2	960 × 608	12.220	0.438	11.976	20.344	72	59
	MobileNetV4	960 × 608	8.562	0.273	8.362	10.776	42	54
	ResNet-34	960 × 608	12.637	0.416	12.424	20.613	138	61
	ResNet34	480 × 304	4.343	0.226	4.040	5.776	138	45

Segmentation Annotation Viewer

A tool for interactively browsing raw camera frames and their corresponding *_segmented_mask.png overlays. It auto-scans all images_* folders under annotations2/, pairs masks with their _image_raw_*.png counterparts (by suffix after __), and displays them vertically. Use “Prev”/“Next” to navigate and “Delete” to move bad pairs into deleted/.

Wandb Training Logs

https://wandb.ai/qbizm/seg_sem_veh_pub

Tools

Get images from rosbags

To save all images (and undistort them) from a ROS2 bag you can go into the utils directory and run:

python3 ros2_bag_to_png.py --bagfile <path to rosbag> --topics <string with topics split with commas> --output <path to save the images (a subdir with name of rosbag will be created there)>

Example:

python3 ros2_bag_to_png.py --bagfile ~/Shared/rosbags/rosbag2_2025_04_14-17_54_17/ --topics '/sensing/camera/front/image_raw' --output '/root/images_from_rosbags/'

Note

Inside the script the distortion parameters are defined. You can adjust them to work with your camera. You can find the parameters in appropriate ROS2 topic.

Annotate the data with GroundingDINO and SAM2

A whole pipeline to automatically annotate images is available inside segmentation_with_sam directory.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
HF_segmentation		HF_segmentation
media		media
pointcloud_merger		pointcloud_merger
segmentation_node		segmentation_node
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
move.sh		move.sh
move_directory.sh		move_directory.sh
point_cloud_processor.py		point_cloud_processor.py
start_container.sh		start_container.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic mapping for Autonomous Vehicles

The Docker container

ROS2 Node for mapping with semantic information

Class filtering

Segmentation models

Class Definitions

Segmentation models visualisations

Inference performance

Segmentation Annotation Viewer

Wandb Training Logs

Tools

Get images from rosbags

Annotate the data with GroundingDINO and SAM2

About

Uh oh!

Releases

Packages

Contributors 3

Languages

mmcza/Semantic-mapping-for-Autonomous-Vehicles

Folders and files

Latest commit

History

Repository files navigation

Semantic mapping for Autonomous Vehicles

The Docker container

ROS2 Node for mapping with semantic information

Class filtering

Segmentation models

Class Definitions

Segmentation models visualisations

Inference performance

Segmentation Annotation Viewer

Wandb Training Logs

Tools

Get images from rosbags

Annotate the data with GroundingDINO and SAM2

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages