Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction

Jiaxin Huang, Yuanbo Yang, Bangbang Yang, Lin Ma, Yuewen Ma, Yiyi Liao

TL;DR: Gen3R creates multi-quantity geometry with RGB from images via a unified latent space that aligns geometry and appearance.

🛠️ Setup

We train and test our model under the following environment:

Debian GNU/Linux 12 (bookworm)
NVIDIA H20 (96G)
CUDA 12.4
Python 3.11
Pytorch 2.5.1+cu124

Clone this repository

git clone https://github.com/JaceyHuang/Gen3R
cd Gen3R

Install packages

conda create -n gen3r python=3.11.2 -y
conda activate gen3r
pip install -r requirements.txt

(Important) Download pretrained Gen3R checkpoint from HuggingFace to ./checkpoints

sudo apt install git-lfs
git lfs install
git clone https://huggingface.co/JaceyH919/Gen3R ./checkpoints

Note: At present, direct loading weights from HuggingFace via from_pretrained("JaceyH919/Gen3R") is not supported due to module naming errors. Please download the model checkpoint locally and load it using from_pretrained("./checkpoints").

🚀 Inference

Run the python script infer.py as follows to test the examples

python infer.py \
    --pretrained_model_name_or_path ./checkpoints \
    --task 2view \
    --prompts examples/2-view/colosseum/prompts.txt \
    --frame_path examples/2-view/colosseum/first.png examples/2-view/colosseum/last.png \
    --cameras free \
    --output_dir ./results \
    --remove_far_points

Some important inference settings below:

--task: 1view for First Frame to 3D, 2view for First-last Frames to 3D, and allview for 3D Reconstruction.
--prompts: the text prompt string or the path to the text prompt file.
--frame_path: the path to the conditional images/video. For the allview task, this can be either the path to a folder containing all frames or the path to the conditional video. For the other two tasks, it should be the path to the conditional image(s).
--cameras: the path to the conditional camera extrinsics and intrinsics. We also provide basic trajectories by specifying this argument as zoom_in, zoom_out, arc_left, arc_right, translate_up or translate down. In this way, we will first use VGGT to estimate the initial camera intrinsics and scene scale. To disable camera conditioning, set this argument to free.

Note that the default resolution of our model is 560×560. If the resolution of the conditioning images or videos differs from this, we first apply resizing followed by center cropping to match the required resolution.

More examples

Click to expand

First Frame to 3D

python infer.py \
    --pretrained_model_name_or_path ./checkpoints \
    --task 1view \
    --prompts examples/1-view/prompts.txt \
    --frame_path examples/1-view/knossos.png \
    --cameras zoom_out \
    --output_dir ./results

First-last Frames to 3D

python infer.py \
    --pretrained_model_name_or_path ./checkpoints \
    --task 2view \
    --prompts examples/2-view/bedroom/prompts.txt \
    --frame_path examples/2-view/bedroom/first.png examples/2-view/bedroom/last.png\
    --cameras examples/2-view/bedroom/cameras.json \
    --output_dir ./results

3D Reconstruction, note that --cameras are ignored in this task.

python infer.py \
    --pretrained_model_name_or_path ./checkpoints \
    --task allview \
    --prompts examples/all-view/prompts.txt \
    --frame_path examples/all-view/garden.mp4 \
    --output_dir ./results

✅ TODO

Release inference code and checkpoints
Release online demo
Release training code & dataset preparation

🎓 Citation

Please cite our paper if you find this repository useful:

@misc{huang2026gen3r3dscenegeneration,
      title={Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction}, 
      author={Jiaxin Huang and Yuanbo Yang and Bangbang Yang and Lin Ma and Yuewen Ma and Yiyi Liao},
      year={2026},
      eprint={2601.04090},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.04090}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
examples		examples
gen3r		gen3r
LICENSE		LICENSE
README.md		README.md
infer.py		infer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction

🛠️ Setup

🚀 Inference

More examples

✅ TODO

🎓 Citation

About

Uh oh!

Releases

Packages

Languages

License

JaceyHuang/Gen3R

Folders and files

Latest commit

History

Repository files navigation

Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction

🛠️ Setup

🚀 Inference

More examples

✅ TODO

🎓 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages