Skip to content

Jorgeromeu/NeRF2D

Repository files navigation

Scaling Down NeRF with NeRF2D

For more information, see the Blog Post

Code for NeRF2D, a 2D analogue of NeRF designed for facilitating experimentation with Neural Radiance Fields, and Novel View Synthesis algorithms.

NeRF2D is trained with 1D views of a 2D scene, and learns to reconstruct a 2D radiance field. Conceptually, this is the same as 3D novel view synthesis, but requires much less compute, and is conceptually easier to understand and visualize:

Page 1

Page 3

We show that we can reformulate NeRF in 2D by reconstructing a 2D shape from 1D views of it. Fitting a 2D NeRF is very fast and we propose this as a viable toy dataset for performing quick experimentation on NeRF

Generating 2D Novel View Synthesis Datasets

To train a 3D NeRF, we need a 2D multi-view dataset. Since these are not readily available we include a blender script for rendering 1D images of an object:

Peek 2024-06-16 19-01

With the addon, we can generate training, validation and testing datasets of any object, with different distribution of camera poses.

image

Since each training view is just a 2D line, we visualize all of the views together by concatenating each view horizontally and plotting them together. For the example scene above we get the following:

image

Experiments

We perform experiments on four testing scenes:

image

2D NeRF

We fit a 2D NeRF using 50 views of resolution 100 in under a minute. We show the reconstructed testing views after training each scene:

image

Additionally, since we are working in 2D space, we can visualize the density field by simply uniformly sampling $x,y$ coordinates and querying the density field over space, enabling us to visualize the reconstructed geometries:

image

Positional Encoding

In NeRF, a critical component to their success was the use of positional encoding. The spectral bias of neural networks makes it difficult for them to express high-frequency spatial functions. They found that a simple solution is to pass the coordinates through a positional encoding $\gamma$ before feeding it to the MLP:

image

Where $\gamma(p)$ is a series $L$ of alternating sines and cosines of $p$ with exponentially increasing frequencies, where $L$ is a hyperparameter. In NeRF they found this critical, only fitting a blurry version of the scene without it:

image

We validated this in NeRF2D, with the "Bunny" scene and unsurprisingly found that without positional encoding $(L=1)$ the learned density field is very low frequency, but as we increase $L$ we can fit higher frequency signals, additionally leading to increased PSNR:

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published