RT-BEV is an innovative framework designed to provide real-time, vision-centric Bird's Eye View (BEV) perception for autonomous vehicles (AVs). BEV perception is essential for improving situational awareness, navigation, and decision-making by offering a 360-degree view of the environment using multi-camera systems. RT-BEV addresses the challenges of computational overhead in real-time BEV perception through a dynamically optimized pipeline that focuses on critical areas of interest.
RT-BEV enhances real-time performance by co-optimizing message communication and detection. It utilizes dynamic Regions of Interest (ROIs) tailored to the driving environment, reducing unnecessary processing and improving accuracy, while minimizing overall end-to-end (e2e) latency.
- ROI-Aware Perception: Dynamically adapts ROIs based on traffic environments and driving contexts, focusing computational power on the most critical areas.
- ROI-Aware Communication & Synchronization: Advanced camera synchronization ensures real-time multi-camera input processing.
- Real-time BEV Perception: Achieves low-latency BEV generation while maintaining high accuracy.
The following figure illustrates the end-to-end BEV perception pipeline used in RT-BEV:
RT-BEV’s modular design optimizes real-time data processing and synchronization to meet the challenges of autonomous driving. The core components include:
- Camera Synchronizer: Synchronizes input from multiple cameras using an ROI-aware policy, reducing synchronization delays and focusing processing on important areas of the scene.
- ROIs Generator: Dynamically generates context-aware ROIs based on the driving environment, minimizing the computational load by focusing only on the most important regions.
- Feature Split & Merge: Processes ROIs with high accuracy while using temporal locality for non-critical regions, merging the processed feature maps for complete scene understanding.
- Time Predictor: Forecasts processing time for ROIs and adjusts priorities in real time based on time-to-collision (TTC) predictions, ensuring critical areas are prioritized.
- Coordinator: Manages synchronization strategies, keyframe frequency, and computational resources, balancing real-time performance and detection accuracy.
The design architecture is shown in the following figure:
By focusing computational resources on dynamic ROIs, RT-BEV significantly reduces latency while maintaining high accuracy in BEV perception, critical for real-time decision-making in autonomous driving scenarios.
The RT-BEV system is implemented in Python using PyTorch for deep learning and ROS for real-time data processing and synchronization. Key implementation components include:
- Torch Inference: PyTorch models process multi-camera images to generate BEV representations.
- ROS Integration: ROS nodes manage real-time camera synchronization, image publishing, and BEV processing to provide seamless communication across modules.
To set up the environment, prepare datasets, and run the system, please refer to the following guides:
The nuScenes v1.0 mini dataset is already included in the Docker container, so no additional setup is required for initial testing.
-
Step 1: Open four terminals connected to the Docker container:
docker exec -it container_name bash -
Step 2: In Terminal 1, run the ROS master node:
roscore
-
Step 3: In Terminal 2, run the RT-BEV inference node:
conda activate uniad cd UniAD ./tool/uniad_inference.sh -
Step 4: In Terminal 3, run the multi-camera synchronization node:
source /home/mobilitylab/catkin_ws/devel/setup.bash roslaunch rtbev_message_filters synchronizer.launch -
Step 5: In Terminal 4, publish camera images:
source /home/mobilitylab/catkin_ws/devel/setup.bash rosrun video_stream_opencv ros_publish_multi_cameras.py
For more detailed instructions, refer to the Running RT-BEV guide.
RT-BEV has been evaluated on the nuScenes V1.0 dataset, achieving the following:
- Accurate BEV representations with reduced computational latency.
- Efficient multi-camera synchronization, ensuring smooth and real-time image processing.
- Robust handling of complex driving scenarios, even in dense environments with multiple dynamic and static objects.
If you use this work, please cite it as follows:
@inproceedings{liu2024bev,
title = {RT-BEV: Enhancing Real-Time BEV Perception for Autonomous Vehicles},
author = {Liu, Liangkai and Lee, Jinkyu and Shin, Kang G.},
booktitle = {45th IEEE Real-Time Systems Symposium},
year = {2024.}
}

