![]() |
Shanlin Sun1 ★
Yifan Wang2 ★
Hanwen Zhang3 ★
Yifeng Xiong1
Qin Ren2
Ruogu Fang4
Xiaohui Xie1
Chenyu You2
1 University of California, Irvine
2 Stony Brook University
3 Huazhong University of Science and Technology
4 University of Florida
★ Equal Contribution
While multi-step diffusion models have advanced both forward and inverse rendering, existing approaches often treat these problems independently, leading to cycle inconsistency and slow inference speed. In this work, we present Ouroboros, a framework composed of two single-step diffusion models that handle forward and inverse rendering with mutual reinforcement. Our approach extends intrinsic decomposition to both indoor and outdoor scenes and introduces a cycle consistency mechanism that ensures coherence between forward and inverse rendering outputs. Experimental results demonstrate state-of-the-art performance across diverse scenes while achieving substantially faster inference speed compared to other diffusion-based methods. We also demonstrate that Ouroboros can transfer to video decomposition in a training-free manner, reducing temporal inconsistency in video sequences while maintaining high-quality per-frame inverse rendering.
Figure: Single-step Diffusion Models for Forward and Inverse Rendering in Cycle Consistency.
Left Upper: Ouroboros decomposes input images into intrinsic maps (albedo, normal, roughness, metallicity, and irradiance). Given these generated intrinsic maps and textual prompts, our neural forward rendering model synthesizes images closely matching the originals.
Right Upper: We extend an end-to-end finetuning technique to diffusion-based neural rendering, outperforming state-of-the-art RGB↔X in both speed and accuracy. The radar plot illustrates numerical comparisons on the InteriorVerse dataset.
Bottom: Our method achieves temporally consistent video inverse rendering without specific finetuning on video data.
- Release inference codes and checkpoints.
- Release training codes.
- Release training dataset.
- We generate masks for windows, mirrors, and other highly specular regions for our datasets so these areas do not bias training; the same masks are applied during evaluation and will ship with the data release.
- We are rebalancing checkpoints to better trade off cycle consistency and rendering quality across datasets; more checkpoints are coming soon.
- Python 3.12
- CUDA-compatible GPU
- Conda package manager
- FFmpeg (for optional video export)
- Clone the repository:
git clone https://github.com/Y-Research-SBU/Ouroboros/tree/main#
cd Ouroboros- Create and activate the conda environment:
conda env create -f environment.yml
conda activate ouroborosEstimate material properties from RGB images:
python rgb2x/inference.py \
--checkpoint="path/to/checkpoint" \
--modality "normals" "albedo" "irradiance" "roughness" "metallicity" \
--condition "rgb" \
--noise "gaussian" \
--input_rgb_path="path/to/input.jpg" \
--output_dir="path/to/output"Generate RGB images from material properties:
python x2rgb/inference.py \
--checkpoint="path/to/checkpoint" \
--modality "rgb" \
--condition "normals" "albedo" "irradiance" "roughness" "metallicity" \
--noise "gaussian" \
--prompt="your text prompt" \
--albedo_path="path/to/albedo.png" \
--normal_path="path/to/normal.png" \
--roughness_path="path/to/roughness.png" \
--metallic_path="path/to/metallic.png" \
--irradiance_path="path/to/irradiance.png" \
--output_dir="path/to/output"Generate temporally consistent material property videos from an RGB frame sequence:
python video_inference.py \
--checkpoint_path "path/to/checkpoint" \
--input_dir "" \
--save_dir "" \
--num_frames 198 \
--window_size 32 \
--stride 16 \
--required_aovs albedo \
--device cuda \
--half_precisionNotes:
input_dirmust contain sequentially numbered frames starting at0.jpg.- The environment configuration for video inference should follow the YAML file located in the video_infer folder.
--checkpoint: Path to model checkpoint or Hugging Face model name--modality: List of modalities to generate/estimate--condition: List of conditioning modalities--noise: Noise type (gaussian,pyramid,zeros)--denoise_steps: Number of denoising steps
--input_rgb_path: Path to input RGB image--output_dir: Output directory for material properties
--prompt: Text prompt for generation--albedo_path: Path to albedo image--normal_path: Path to normal map--roughness_path: Path to roughness map--metallic_path: Path to metallic map--irradiance_path: Path to irradiance map--output_dir: Output directory for generated RGB
For questions, feedback, or collaboration opportunities, please contact:
Email: shanlins@uci.edu, chenyu.you@stonybrook.edu
If you use this code in your research, please cite with:
@article{sun2025ouroboros,
title={Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering},
author={Sun, Shanlin and Wang, Yifan and Zhang, Hanwen and Xiong, Yifeng and Ren, Qin and Fang, Ruogu and Xie, Xiaohui and You, Chenyu},
journal={arXiv preprint arXiv:2508.14461},
year={2025}
}This project is licensed under the MIT License - see the LICENSE file for details.

