Introduction
The task of reconstructing the per-scene radiance field from a set of images has been recently revolutionized by Neural Radiance Field (NeRF) for its state-of-the-art quality and flexibility. However, NeRF and its variants require a lengthy training time ranging from hours to days for a single scene. In comparison, the proposed technique achieves NeRF-comparable quality and converges rapidly from scratch in less than 15 minutes with a single GPU. Our method adopts a representation consisting of a density voxel grid for scene geometry and a feature voxel grid with a shallow network for complex view-dependent appearance. Modeling with explicit and discretized volume representations is not new, but we propose two simple yet non-trivial techniques that contribute to fast convergence speed and high-quality output. First, we introduce the post-activation interpolation on voxel density, which is capable of producing sharp surfaces in lower grid resolution. Second, direct voxel density optimization is prone to suboptimal geometry solutions, so we robustify the optimization process by imposing several priors.
https://github.com/sunset1995/DirectVoxGO https://sunset1995.github.io/dvgo/