NeRF and NeRS¶

Nader Zantout¶

This work was initially done in partial fulfillment to the requirements of the course Learning for 3D Vision at CMU. It consists of an implementation of a differentiable volume renderer, a neural radiance field (NeRF) model, differentiable sphere tracing for surface rendering, and a neural surface (NerS) model. Here is the code.

1. Differentiable Volume Rendering¶

A rendered cube volume gif (left) and a depth map (right) are shown below:

No description has been provided for this image

2. Optimizing a basic implicit volume¶

Here is an optimized box volume of center (0.25, 0.25, 0.00) and side lengths (2.00, 1.50, 1.50):

No description has been provided for this image

3. Optimizing a Neural Radiance Field¶

I implemented the NeRF architecture in the original paper. I did not include view dependence for the lego bulldozer example. The hyperparameters in nerf_lego.yaml were used, and I obtained the following visualization:

No description has been provided for this image

4. NeRF Extras¶

4.1. View Dependence¶

I added view dependence by concatenating the direction harmonics with the penultimate hidden layer in the MLP, then feeding the concatenated output to the final hidden layer and producing the feature output. View dependence can be set in the yaml file by setting view_dependent = True. The results are shown below, and the specular reflections can be observed:

No description has been provided for this image

View dependence enables the modeling of specular reflections, which leads to more lifelike volumetric reconstructions of shiny materials. As described in the NeRF paper, however, view dependence is only added in the final hidden layers of the MLP to prevent overfitting. Overfitting to view-dependent effects at certain views negatively impacts generalization to new views, and may lead to unwanted lighting effects and artifacts in new views. Dedicating less layers to view dependent effects is therefore crucial for a good volumetric reconstruction.

5. Sphere Tracing¶

A torus rendered by my sphere tracing implementation is shown below:

No description has been provided for this image

Sphere tracing begins by initializing the points of intersection at the origins of the rays, $\mathbf{p}_i = \mathbf{o}_i$. The SDF $f$ is then evaluated at $\mathbf{p}_i$, and each point is updated by marching $f(\mathbf{p}_i)$ units forward on the ray, using the update rule $\mathbf{p}_i := \mathbf{p}_i + \mathbf{d}_i * f(\mathbf{p}_i)$. This is repeated until either the maximum allowed number of steps is reached, or until $f(\mathbf{p}_i) < \epsilon$ for some $\epsilon$. To make a parallel implementation easier, I kept performing updates on all the points $\mathbf{p}_i$ in a vectorized fashion until the maximum allowed number of steps was reached. At the end of the loop, a mask is made of all the points $\mathbf{p}_i$ such that $f(\mathbf{p}_i) < \epsilon = 10^{-5}$, which is a value I found sufficient to include all the points that actually converged.

6. Optimizing a Neural SDF¶

I used the same architecture I used for NeRF: A 6 hidden layer MLP with 128 hidden units and ReLU activations outputting a distance function without a nonlinearity along with a feature vector. 4 harmonic functions are used for the positional encoding of the input point.

The network was trained for 10000 epochs at a learning rate of 0.0001. The other hyperparameters in points_surface.yaml were kept default. The eikonal loss is defined in eikonal_loss in losses.py. The input point cloud (left) and the output surface (right) are shown below:

No description has been provided for this image

7. VolSDF¶

For color prediction, the feature vector produced by the MLP described in Section 6 is input into 2 hidden layers then a sigmoid activated output. I implemented the SDF to density function described in section 3.1 of the VolSDF Paper.

Questions:

How does high beta bias your learned SDF? What about low beta?

The SDF to density function is the CDF of the Laplace distribution scaled by alpha. This is a smoothed version of the indicator function $\mathbf{1}_{\Omega}(x)$ that decreases from alpha in the interior to 0 in the exterior, and beta determines the slope of that decrease. A high beta leads to a smaller slope of decrease, which leads to a 'fuzzier' volume boundary and less sharp geometry, making the surface look as if it was smeared out. A low beta conversely leads to a larger slope of decrease, and leads to a sharper volume boundary and a more well defined transition.

Would an SDF be easier to train with volume rendering and low beta or high beta? Why?

As beta is inversely proportional to the slope of decrease at the boundary, a low beta leads to smaller gradients outside the boundary and much larger gradients near the boundary, which may lead to overshooting near the boundary and slower convergence outside the boundary as it approaches the discontinuous indicator function further and further, making training more difficult. High beta leads to smaller gradients overall, and therefore easier training.

Would you be more likely to learn an accurate surface with high beta or low beta? Why?

A high beta leads to a 'fuzzier' boundary during rendering and therefore a less accurate surface when rendering as volume elements are distributed outside the boundary. A lower beta leads to a sharper boundary, and therefore a more accurate surface.

I obtained the best renders with alpha = 20.0 and beta = 0.05:

No description has been provided for this image

Increasing alpha from its default of 10 to 20 led to a better surface rendering with less artifacts with no noticeable change in the volume rendering. The default beta is optimal, as increasing it led to fuzzier surfaces as expected, and decreasing it led to too many artifacts in the volume render indicating a difficulty in network convergence. Increasing alpha too much leads to a fuzzier render as well since more density is outside the surface boundary, and decreasing it leads to a surface with fuzziness, translucency, and artifacts in volume rendering.

8. Neural Surface Extras¶

8.1. Render a Large Scene with Sphere Tracing¶

I defined a new SDF class, ComplexSDF, which is a square pyramid of spheres with a 5x5 base and 25+9+1=35 spheres in total. This SDF can be rendered using the configuration complex_surface.yaml, and the rendering is shown below:

No description has been provided for this image