Skip to content

Zeying-Gong/ascent

Repository files navigation

Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration

Zeying Gong1, Rong Li1, Tianshuai Hu2, Ronghe Qiu1, Lingdong Kong3,
Lingfeng Zhang4, Guoyang Zhao1, Yiyi Ding1, Junwei Liang1,2,✉

1 The Hong Kong University of Science and Technology (Guangzhou).   
2 The Hong Kong University of Science and Technology   
3 National University of Singapore   
4 Tsinghua University   

Project Web Badge YouTube Video Badge arXiv Paper Badge Habitat Sim Badge MIT License Badge

📋 TODO List

  • ✅ Complete Installation and Usage documentation
  • ✅ Add datasets download documentation
  • ✅ Release the main algorithm of ASCENT
  • ❌ Release the code of real-world deployment

🛠️ Environment Setup

1. Preparing Conda Environment

Assuming you have conda installed, let's prepare a conda env:

conda_env_name=ascent_nav
conda create -n $conda_env_name python=3.9 cmake=3.14.0
conda activate $conda_env_name

Install proper version of torch:

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118

2. Install Habitat-Sim

conda install habitat-sim=0.3.1 withbullet headless -c conda-forge -c aihabitat

If you encounter network problems, you can manually download the Conda package from this link to download the conda bag, and install it via: conda install --use-local /path/to/xxx.tar.bz2 to download.

In theory, versions >= 0.2.4 are all compatible, but it is better to keep the same version between habitat-lab and habitat-sim. Here we use 0.3.1 version.

3. Clone Repository

git clone --recurse-submodules https://github.com/Zeying-Gong/ascent.git

4. Install Habitat-Lab

cd third_party/habitat-lab
git checkout v0.3.1
pip install -e habitat-lab
pip install -e habitat-baselines
cd ../..

4. GroundingDINO

Following GroundingDINO's instruction:

export CUDA_HOME=/path/to/cuda-11.8 # replace with actual path

cd third_party/GroundingDINO
pip install -e . --no-build-isolation --no-dependencies
cd ../..

5. MobileSAM

Following MobileSAM's instruction:

cd third_party/MobileSAM
pip install -e .
cd ../..

6. Others

pip install -r requirements.txt

The following dependencies require special build flags:

pip install transformers==4.37.0

🏋️ Downloading Model Weights

Download the required model weights and save them to the pretrained_weights/ directory:

Model Filename Download Link
Places365 resnet50_places365.pth.tar Download
MobileSAM mobile_sam.pt GitHub
GroundingDINO groundingdino_swint_ogc.pth GitHub
D-FINE dfine_x_obj2coco.pth GitHub
RedNet rednet_semmap_mp3d_40.pth Google Drive
RAM++ ram_plus_swin_large_14m.pth HuggingFace

Qwen2.5-7B Weights

Through HuggingFace or ModelScope download the checkpoints, and put them in pretrained_weights/

PointNav Weights

The PointNav weight is directly from VLFM, located in third_party/vlfm/data/pointnav_weights.pth.

  • Locate Datasets: The file structure should look like this:
pretrained_weights
├── mobile_sam.pt
├── groundingdino_swint_ogc.pth
├── dfine_x_obj2coco.pth
├── ram_plus_swin_large_14m.pth
├── rednet_semmap_mp3d_40.pth
├── resnet50_places365.pth.tar
└── Qwen2.5-7b
    ├── model-00001-of-00005.safetensors
    └── ...

📚 Datasets Setup

  • Download Scene & Episode Datasets: Following the instructions for HM3D and MP3D in Habitat-lab's Datasets.md.

  • Locate Datasets: The file structure should look like this:

data
└── datasets
    ├── objectnav
    │   ├── hm3d
    │   │   └── v1
    │   │        └── val
    │   │             ├── content
    │   │             └── val.json.gz
    │   └── mp3d
    │       └── v1
    │            └── val
    │                 ├── content
    │                 └── val.json.gz
    └── scene_datasets
        ├── hm3d
        │   └── ...
        └── mp3d
            └── ...

🚀 Evaluation

Run VLM servers

./scripts/launch_vlm_servers_ascent.sh

It will open a tmux windows in a separate terminal.

Open another terminal, run evaluation on HM3D dataset:

python -u -m ascent.run --config-name=eval_ascent_hm3d.yaml

Or run evaluation on MP3D dataset:

python -u -m ascent.run --config-name=eval_ascent_mp3d.yaml

⚠️ Notes

  • This is a refactored version of the original codebase with improved code organization and structure.
  • Due to the inherent randomness in object detection (GroundingDINO, D-FINE) and LLM inference (Qwen2.5), evaluation results may vary slightly from the paper's reported metrics.

✒️ Citation

If you use ASCENT in your research, please use the following BibTeX entry.

@article{gong2025stairway,
  title={Stairway to Success: Zero-Shot Floor-Aware Object-Goal Navigation via LLM-Driven Coarse-to-Fine Exploration},
  author={Gong, Zeying and Li, Rong and Hu, Tianshuai and Qiu, Ronghe and Kong, Lingdong and Zhang, Lingfeng and Ding, Yiyi and Zhang, Leying and Liang, Junwei},
  journal={arXiv preprint arXiv:2505.23019},
  year={2025}
}

🙏 Acknowledgments

We would like to thank the following repositories for their contributions:

About

[RAL‘26] Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages