GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation
CVPR 2022 (Oral Presentation)
3D-aware image generative modeling aims to generate 3D-consistent images with explicitly controllable camera poses. Recent works have shown promising results by training neural radiance field (NeRF) generators on unstructured 2D image collections, but they still can not generate highly-realistic images with fine details. A critical reason is that the high memory and computation cost of volumetric representation learning greatly restricts the number of point samples for radiance integration during training. Deficient point sampling not only limits the expressive power of the generator to handle high-frequency details but also impedes effective GAN training due to the noise caused by unstable Monte Carlo sampling. We propose a novel approach that regulates point sampling and radiance field learning on 2D manifolds, embodied as a set of implicit surfaces in the 3D volume learned with GAN training. For each viewing ray, we calculate the ray-surface intersections and accumulate their radiance predicted by the network. We show that by training and rendering such radiance manifolds, our generator can produce high quality images with realistic fine details and strong visual 3D consistency.
Overview of the GRAM method. The generator G consists of a manifold predictor M and a radiance generator Φ. M predicts multiple isosurfaces which define the input domain of Φ. The intersections between camera rays and the isosurfaces are sent to Φ for color and occupancy prediction. Images are then generated by compositing the color of the points along the ray.
GRAM is able to generate high-quality images with fine details. Moreover, it allows an explicit control of camera viewpoint and achieves highly consistent results across different views. It even maintains strong visual 3D consistency for very thin structures such as bangs of hair, eyeglass, and whiskers of cat.
GRAM achieves the best visual quality with realistic details and remarkable 3D consistency on multiple datasets comparing with previous 3D-aware image generation methods.
GRAM constrains point sampling and radiance field learning on 2D manifolds, embodied as a set of implicit surfaces. These implicit surfaces are shared for the trained object category, jointly learned with GAN training, and fixed at inference time.
3D Geometry Visualization
Although GRAM confines the input domain of the radiance field on 2D manifolds, we can still extract proxy 3D shapes of the generated objects using the volume-based marching cubes algorithm. It can be observed that GRAM produces high-quality geometry with detailed structures well depicted, which is the key to achieve strong visual 3D consistency across different views.
Image Embedding and Editing
GAN inversion is naturally supported by GRAM. Given an input image, we can first embed it into the learned latent space and then freely move the camera viewpoint to synthesize images at novel views.
We thank Harry Shum for the fruitful advice and discussion to improve the paper.
The website template was adapted from Mip-NeRF.