SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

Peizheng Li^{* 1,2}, Zhenghao Zhang^{* 1,4}, David Holtz¹, Hang Yu^1,5, Yutong Yang^1,6, Yuzhi Lai², Rui Song⁷, Andreas Geiger^2,3, Andreas Zell ²

¹ Mercedes-Benz AG, ² University of Tübingen, ³ Tübingen AI Center, ⁴ TU Munich, ⁵ Karlsruhe Institute of Technology, ⁶ University of Stuttgart, ⁷ UCLA

(*) Equal contribution

End-to-end autonomous driving methods built on vision language models (VLMs) have undergone rapid development driven by their universal visual understanding and strong reasoning capabilities obtained from the large-scale pretraining. However, we find that current VLMs struggle to understand fine-grained 3D spatial relationships which is a fundamental requirement for systems interacting with the physical world.To address this issue, we propose SpaceDrive, a spatial-aware VLM-based driving framework that treats spatial information as explicit positional encodings (PEs) instead of textual digit tokens, enabling joint reasoning over semantic and spatial representations. SpaceDrive employs a universal positional encoder to all 3D coordinates derived from multi-view depth estimation, historical ego-states, and text prompts. These 3D PEs are first superimposed to augment the corresponding 2D visual tokens. Meanwhile, they serve as a task-agnostic coordinate representation, replacing the digit-wise numerical tokens as both inputs and outputs for the VLM. This mechanism enables the model to better index specific visual semantics in spatial reasoning and directly regress trajectory coordinates rather than generating digit-by-digit, thereby enhancing planning accuracy. Extensive experiments validate that SpaceDrive achieves state-of-the-art open-loop performance on the nuScenes dataset and the second-best Driving Score of 78.02 on the Bench2Drive closed-loop benchmark over existing VLM-based methods.

📰 News

[2025/12/11] Paper is released on arXiv.

⌨️ Code

The code is currently under the internal review process and will be released soon.

🎥 Visualizations

Construction.mp4	Occluded_Intrusion.mp4	Emergency_Yield.mp4
Cyclist_Yielding.mp4	Open_Door.mp4	Nighttime_Traffic.mp4

📜 License

SpaceDrive is released under the MIT license. Please see the LICENSE file for more information.

🔗 Citation -->

@article{li2025spacedrive,
  title={SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving},
  author={Li, Peizheng and Zhang, Zhenghao and Holtz, David and Yu, Hang and Yang, Yutong and Lai, Yuzhi and Song, Rui and Geiger, Andreas and Zell, Andreas},
  journal={arXiv preprint arXiv:2512.10719},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

📰 News

⌨️ Code

🎥 Visualizations

📜 License

🔗 Citation -->

About

Uh oh!

Releases

Packages

License

zhenghao2519/SpaceDrive

Folders and files

Latest commit

History

Repository files navigation

SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

📰 News

⌨️ Code

🎥 Visualizations

📜 License

🔗 Citation -->

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages