RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding

ICLR 2026

Setup

Environment

A requirements.txt, and docker file are provided

pip install -r requirements.txt

or

bash docker_build.sh

Data

For RealEstate10K we use the same data format as pixelSplat. Please follow the data formating instructions provided there. You can also download a preprocessed dataset here. The dataset can be left in the zip file and loaded directly from it.

A subset of our synthetic multimodal dataset can be found here.

Usage

Training

Training is down as follows.

For training rgb on RealEstate10K:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path /path/to/re10k.zip

For training rgb-thermal on our synthetic rgb-thermal dataset:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path /path/to/MultimodalBlender --dataset_type multimodal

Alternative embedding methods can be selected.

Validation

The validation on different zooming-in (focal length) factors can be done via:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --test-zoom-in "2" --dataset_path "/path/to/re10k.zip"

And different synthetic distortion on re10k:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --test-distortion "2" --dataset_path "/path/to/re10k.zip"

Testing only on the multimodal dataset:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path "/path/to/MultimodalBlender" --dataset_type multimodal --test-only

NOTE: Here the network architecture used for rgb and rgb-thermal are the same, for simplicity. In the paper for the multimodal networked differed slightly rgb only.

Pretrained Weights (RealEstate10K)

Download	# Params	PSNR	SSIM	LPIPS
RE10K_12layers_768dim.pt	48M	28.79	0.8824	0.0483

Acknowledgement

This repository is built on top of LVSM and PRoPE repositories. Who we thank for making their work open-source.

Citation

If you find this work useful, please consider citing our work:

@inproceedings{rore2026,
  title={RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding},
  author={Ryan Griffiths and Donald G. Dansereau},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=BR2ItBcqOo}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker_build.sh		docker_build.sh
nvs.sh		nvs.sh
requirements.txt		requirements.txt
trainval.py		trainval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding

ICLR 2026

Setup

Environment

Data

Usage

Training

Validation

Pretrained Weights (RealEstate10K)

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

RoboticImaging/RoRE

Folders and files

Latest commit

History

Repository files navigation

RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding

ICLR 2026

Setup

Environment

Data

Usage

Training

Validation

Pretrained Weights (RealEstate10K)

Acknowledgement

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages