Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration
Zeying Gong1,
Rong Li1,
Tianshuai Hu2,
Ronghe Qiu1,
Lingdong Kong3,
Lingfeng Zhang4,
Guoyang Zhao1,
Yiyi Ding1,
Junwei Liang1,2,✉
1 The Hong Kong University of Science and Technology (Guangzhou).
2 The Hong Kong University of Science and Technology
3 National University of Singapore
4 Tsinghua University
- ✅ Complete Installation and Usage documentation
- ✅ Add datasets download documentation
- ✅ Release the main algorithm of ASCENT
- ❌ Release the code of real-world deployment
Assuming you have conda installed, let's prepare a conda env:
conda_env_name=ascent_nav
conda create -n $conda_env_name python=3.9 cmake=3.14.0
conda activate $conda_env_name
Install proper version of torch:
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
conda install habitat-sim=0.3.1 withbullet headless -c conda-forge -c aihabitat
If you encounter network problems, you can manually download the Conda package from this link to download the conda bag, and install it via:
conda install --use-local /path/to/xxx.tar.bz2to download.
In theory, versions >= 0.2.4 are all compatible, but it is better to keep the same version between habitat-lab and habitat-sim. Here we use 0.3.1 version.
git clone --recurse-submodules https://github.com/Zeying-Gong/ascent.git
cd third_party/habitat-lab
git checkout v0.3.1
pip install -e habitat-lab
pip install -e habitat-baselines
cd ../..
Following GroundingDINO's instruction:
export CUDA_HOME=/path/to/cuda-11.8 # replace with actual path
cd third_party/GroundingDINO
pip install -e . --no-build-isolation --no-dependencies
cd ../..
Following MobileSAM's instruction:
cd third_party/MobileSAM
pip install -e .
cd ../..
pip install -r requirements.txt
The following dependencies require special build flags:
pip install transformers==4.37.0Download the required model weights and save them to the pretrained_weights/ directory:
| Model | Filename | Download Link |
|---|---|---|
| Places365 | resnet50_places365.pth.tar |
Download |
| MobileSAM | mobile_sam.pt |
GitHub |
| GroundingDINO | groundingdino_swint_ogc.pth |
GitHub |
| D-FINE | dfine_x_obj2coco.pth |
GitHub |
| RedNet | rednet_semmap_mp3d_40.pth |
Google Drive |
| RAM++ | ram_plus_swin_large_14m.pth |
HuggingFace |
Through HuggingFace or ModelScope download the checkpoints, and put them in pretrained_weights/
The PointNav weight is directly from VLFM, located in third_party/vlfm/data/pointnav_weights.pth.
- Locate Datasets: The file structure should look like this:
pretrained_weights
├── mobile_sam.pt
├── groundingdino_swint_ogc.pth
├── dfine_x_obj2coco.pth
├── ram_plus_swin_large_14m.pth
├── rednet_semmap_mp3d_40.pth
├── resnet50_places365.pth.tar
└── Qwen2.5-7b
├── model-00001-of-00005.safetensors
└── ...
-
Download Scene & Episode Datasets: Following the instructions for HM3D and MP3D in Habitat-lab's Datasets.md.
-
Locate Datasets: The file structure should look like this:
data
└── datasets
├── objectnav
│ ├── hm3d
│ │ └── v1
│ │ └── val
│ │ ├── content
│ │ └── val.json.gz
│ └── mp3d
│ └── v1
│ └── val
│ ├── content
│ └── val.json.gz
└── scene_datasets
├── hm3d
│ └── ...
└── mp3d
└── ...
Run VLM servers
./scripts/launch_vlm_servers_ascent.sh
It will open a tmux windows in a separate terminal.
Open another terminal, run evaluation on HM3D dataset:
python -u -m ascent.run --config-name=eval_ascent_hm3d.yaml
Or run evaluation on MP3D dataset:
python -u -m ascent.run --config-name=eval_ascent_mp3d.yaml
- This is a refactored version of the original codebase with improved code organization and structure.
- Due to the inherent randomness in object detection (GroundingDINO, D-FINE) and LLM inference (Qwen2.5), evaluation results may vary slightly from the paper's reported metrics.
If you use ASCENT in your research, please use the following BibTeX entry.
@article{gong2025stairway,
title={Stairway to Success: Zero-Shot Floor-Aware Object-Goal Navigation via LLM-Driven Coarse-to-Fine Exploration},
author={Gong, Zeying and Li, Rong and Hu, Tianshuai and Qiu, Ronghe and Kong, Lingdong and Zhang, Lingfeng and Ding, Yiyi and Zhang, Leying and Liang, Junwei},
journal={arXiv preprint arXiv:2505.23019},
year={2025}
}
We would like to thank the following repositories for their contributions:
