RT-BEV: Real-Time BEV Perception for AVs

Overview

RT-BEV is an innovative framework designed to provide real-time, vision-centric Bird's Eye View (BEV) perception for autonomous vehicles (AVs). BEV perception is essential for improving situational awareness, navigation, and decision-making by offering a 360-degree view of the environment using multi-camera systems. RT-BEV addresses the challenges of computational overhead in real-time BEV perception through a dynamically optimized pipeline that focuses on critical areas of interest.

RT-BEV enhances real-time performance by co-optimizing message communication and detection. It utilizes dynamic Regions of Interest (ROIs) tailored to the driving environment, reducing unnecessary processing and improving accuracy, while minimizing overall end-to-end (e2e) latency.

Key Features:

ROI-Aware Perception: Dynamically adapts ROIs based on traffic environments and driving contexts, focusing computational power on the most critical areas.
ROI-Aware Communication & Synchronization: Advanced camera synchronization ensures real-time multi-camera input processing.
Real-time BEV Perception: Achieves low-latency BEV generation while maintaining high accuracy.

End-to-End BEV Perception Pipeline

The following figure illustrates the end-to-end BEV perception pipeline used in RT-BEV:

System Design

RT-BEV’s modular design optimizes real-time data processing and synchronization to meet the challenges of autonomous driving. The core components include:

Camera Synchronizer: Synchronizes input from multiple cameras using an ROI-aware policy, reducing synchronization delays and focusing processing on important areas of the scene.
ROIs Generator: Dynamically generates context-aware ROIs based on the driving environment, minimizing the computational load by focusing only on the most important regions.
Feature Split & Merge: Processes ROIs with high accuracy while using temporal locality for non-critical regions, merging the processed feature maps for complete scene understanding.
Time Predictor: Forecasts processing time for ROIs and adjusts priorities in real time based on time-to-collision (TTC) predictions, ensuring critical areas are prioritized.
Coordinator: Manages synchronization strategies, keyframe frequency, and computational resources, balancing real-time performance and detection accuracy.

The design architecture is shown in the following figure:

By focusing computational resources on dynamic ROIs, RT-BEV significantly reduces latency while maintaining high accuracy in BEV perception, critical for real-time decision-making in autonomous driving scenarios.

Implementation

The RT-BEV system is implemented in Python using PyTorch for deep learning and ROS for real-time data processing and synchronization. Key implementation components include:

Torch Inference: PyTorch models process multi-camera images to generate BEV representations.
ROS Integration: ROS nodes manage real-time camera synchronization, image publishing, and BEV processing to provide seamless communication across modules.

Installation and Setup

To set up the environment, prepare datasets, and run the system, please refer to the following guides:

The nuScenes v1.0 mini dataset is already included in the Docker container, so no additional setup is required for initial testing.

Usage Instructions

For Docker Users:

Step 1: Open four terminals connected to the Docker container:
```
docker exec -it container_name bash
```
Step 2: In Terminal 1, run the ROS master node:
```
roscore
```

Step 3: In Terminal 2, run the RT-BEV inference node:

conda activate uniad
cd UniAD
./tool/uniad_inference.sh

Step 4: In Terminal 3, run the multi-camera synchronization node:

source /home/mobilitylab/catkin_ws/devel/setup.bash
roslaunch rtbev_message_filters synchronizer.launch

Step 5: In Terminal 4, publish camera images:

source /home/mobilitylab/catkin_ws/devel/setup.bash
rosrun video_stream_opencv ros_publish_multi_cameras.py

For more detailed instructions, refer to the Running RT-BEV guide.

Results

RT-BEV has been evaluated on the nuScenes V1.0 dataset, achieving the following:

Accurate BEV representations with reduced computational latency.
Efficient multi-camera synchronization, ensuring smooth and real-time image processing.
Robust handling of complex driving scenarios, even in dense environments with multiple dynamic and static objects.

Citation

If you use this work, please cite it as follows:

@inproceedings{liu2024bev,
  title = {RT-BEV: Enhancing Real-Time BEV Perception for Autonomous Vehicles},
  author = {Liu, Liangkai and Lee, Jinkyu and Shin, Kang G.},
  booktitle = {45th IEEE Real-Time Systems Symposium},
  year = {2024.}
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
UniAD		UniAD
catkin_ws		catkin_ws
doc		doc
docker		docker
mmdetection3d		mmdetection3d
README.md		README.md
uniad.yml		uniad.yml
uniad_requirements.txt		uniad_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RT-BEV: Real-Time BEV Perception for AVs

Overview

Key Features:

End-to-End BEV Perception Pipeline

System Design

Implementation

Installation and Setup

Usage Instructions

For Docker Users:

Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Torreskai0722/RT-BEV

Folders and files

Latest commit

History

Repository files navigation

RT-BEV: Real-Time BEV Perception for AVs

Overview

Key Features:

End-to-End BEV Perception Pipeline

System Design

Implementation

Installation and Setup

Usage Instructions

For Docker Users:

Results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages