Skip to content

[ICLR 2026] RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding

Notifications You must be signed in to change notification settings

RoboticImaging/RoRE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding

ICLR 2026

Ryan Griffiths, Donald G. Dansereau

Project Page, Paper

Setup

Environment

A requirements.txt, and docker file are provided

pip install -r requirements.txt 

or

bash docker_build.sh

Data

For RealEstate10K we use the same data format as pixelSplat. Please follow the data formating instructions provided there. You can also download a preprocessed dataset here. The dataset can be left in the zip file and loaded directly from it.

A subset of our synthetic multimodal dataset can be found here.

Usage

Training

Training is down as follows.

For training rgb on RealEstate10K:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path /path/to/re10k.zip

For training rgb-thermal on our synthetic rgb-thermal dataset:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path /path/to/MultimodalBlender --dataset_type multimodal

Alternative embedding methods can be selected.

Validation

The validation on different zooming-in (focal length) factors can be done via:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --test-zoom-in "2" --dataset_path "/path/to/re10k.zip"

And different synthetic distortion on re10k:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --test-distortion "2" --dataset_path "/path/to/re10k.zip"

Testing only on the multimodal dataset:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path "/path/to/MultimodalBlender" --dataset_type multimodal --test-only

NOTE: Here the network architecture used for rgb and rgb-thermal are the same, for simplicity. In the paper for the multimodal networked differed slightly rgb only.

Pretrained Weights (RealEstate10K)

Download # Params PSNR SSIM LPIPS
RE10K_12layers_768dim.pt 48M 28.79 0.8824 0.0483

Acknowledgement

This repository is built on top of LVSM and PRoPE repositories. Who we thank for making their work open-source.

Citation

If you find this work useful, please consider citing our work:

@inproceedings{rore2026,
  title={RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding},
  author={Ryan Griffiths and Donald G. Dansereau},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=BR2ItBcqOo}
}

About

[ICLR 2026] RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published