OVA-Fields: Weakly Supervised Open-Vocabulary Affordance Fields for Robot Operational Part Detection
Heng Su
·
Mengying Xie
·
Nieqing Cao
·
Ding Yan
·
Beichen Shao
Xianlei Long
·
Fuqiang Gu
·
Chao Chen*
Paper | Video | Project Page
OVA-Fields enables robots to detect and interact with functional parts in real 3D scenes by mapping natural language commands to precise affordance locations.
Table of Contents
- [6/26/2025] Our paper is accepted to ICCV 2025
- [12/12/2024] Code is released.
conda create -n ova python=3.10
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
cd OVA-Fields
pip install -r requirements.txt
cd gridencoder
conda install nvidia/label/cuda-12.1.0::cuda-nvcc
you can use command: `which nvcc` to see where the nvcc installed.
~which nvcc: /home/user/anaconda3/envs/ova/bin/nvcc
export CUDA_HOME=/home/user/anaconda3/envs/ova
python setup.py install
Our data collection is conducted using the Record3D App on iPhone/iPad Pro equipped with a LiDAR module. This app efficiently captures RGB-D video frames while recording associated data, including camera intrinsics, extrinsics, and pose information. The collected data can be exported in .r3d format, which our code can directly read and process.
You can use your own device to collect data for custom scenes. During the data collection process, please keep the camera stable and ensure the entire object is captured. This will help improve the data quality and enhance the model's performance.
you can find more infomation at: Record3D
Additionally, we provide the dataset used in our paper, which can be downloaded at Google Drive, you can run the demo.ipynb file to quickly test the performance of our model.
Our pretrained model can be downloaded from Google Drive.
After setting up the required environment and obtaining the necessary data, you can train our model by following these steps:
-
Modify the
DATA_PATHin build_data.py to point to your data directory. -
Modify the
SAVE_DIRECTORYin train.py to specify the path where you want to save the model. -
Run the following command to build the model:
python build_data.py
-
Run the following command to start training the model:
python train.py
You can also easily run our code and obtain relevant visualizations by executing demo.ipynb.
We would like to extend our gratitude to CLIP-Fields, whose code has greatly supported our work.
If you find our code or paper useful, please cite
Su, H., Xie, M., Cao, N., Ding, Y., Shao, B., Long, X., ... & Chen, C. (2025). OVA-Fields: Weakly Supervised Open-Vocabulary Affordance Fields for Robot Operational Part Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6385-6395).