@inproceedings{deng20253dllava,
title={3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer},
author={Deng, Jiajun and He, Tianyu and Jiang, Li and Wang, Tianyu and Dayoub, Feras and Reid, Ian},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
We provide the Docker Image to run our 3D-LLaVA. Please run the following code to pull the docker image:
docker pull djiajun1206/3d-llava-slim
We conduct experiments with the scans data from Scannet, as well as the text description from ScanRefer, ScanQA, SQA3D, ReferIt3D and Multi3DRefer. To enable conventiently getting access to the data, we provide the processed data. The data are supposed to be placed in ./playground, and the data structure is as follows:
3D-LLaVA # project root
|── playground
| |── data
│ | ├── scannet
│ | │ ├── super_points
| │ │ ├── train
| │ │ ├── val
| │ │ └── scannet_axis_align_matrix_trainval.pkl
│ | ├── train_info
│ │ | ├── scanqa_train_3d_llava.json
│ │ | ├── sqa3d_train_3d_llava.json
│ │ | ├── scan2cap_train_3d_llava.json
│ │ | ├── ...
│ │ └── eval_info
│ │ | ├── scanqa
│ │ | ├── sqa3d
│ │ | ├── densecap_scanrefer
│ │ | ├── ...
We exploit LoRA tuning by default. Please train the 3D-LLaVA with:
./scripts/train/finetune-3d-llava-lora.sh
We provide the scripts to evaluate our model on ScanQA, SQA3D, Scan2Cap, ScanRefer, Multi3DRefer. Please run:
./scripts/eval/multigpu_eval_sqa3d.sh
./scripts/eval/multigpu_eval_scanqa.sh
./scripts/eval/multigpu_eval_scan2cap.sh
./scripts/eval/multigpu_eval_scanrefer.sh
./scripts/eval/multigpu_eval_multi3drefer.sh
Thanks to the following great repositories: LLaVA, PonderV2, OneFormer3d.
