EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS

Sharath Girish          Kamal Gupta          Abhinav Shrivastava         
University of Maryland, College Park

Our method achieves similar reconstruction quality as 3D-GS (PSNR in dB) with 10-20x smaller model size (MB) and accelerated rendering speed(FPS).

Abstract

Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for both training and storage, as they require millions of Gaussians in their point cloud representation for each scene.

We present a technique utilizing quantized embeddings to significantly reduce memory storage requirements and a coarse-to-fine training strategy for a faster and more stable optimization of the Gaussian point clouds. Our approach results in scene representations with fewer Gaussians and quantized representations, leading to faster training times and rendering speeds for real-time rendering of high resolution scenes. We reduce memory by more than an order of magnitude all while maintaining the reconstruction quality. We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x less memory and faster training/inference speed.

Approach

Our approach consists of two key components: (1) quantized embeddings and (2) coarse-to-fine training. We represent attributes of each Gaussian using quantized latents which are decoded using a decoder to obtain the original attributes. We also progressively render the image and apply the reconstruction loss with increasing resolution. This allows us to train the model faster and also leads to better optimization. We additionally also control the frequency of densification of the Gaussians which significantly improves the speed without compromising on the quality.

Accelerated training

Our method achieves better reconstruction quality early into training compared to 3D-GS and converges faster achieving similar quality.

Qualitative Comparisons

We show qualitative results on various scenes from the MipNeRF dataset. Fast NeRF methods such as Instant-NGP and Plenoxels produce blurry results. Mip-NeRF360, while producing sharp results, is extremely slow to train and render. 3D-GS on the other hand has floaters at the boundaries of the scene due to ill-optimized Gaussians. Our method reconstructs the scene with fine details with fewer artifacts while also being much smaller and faster.

GPU Memory Usage

gpu_memory

We compare peak GPU memory usage of our method with 3D-GS during training and rendering. Our method consumes lower GPU memory during both phases. Notably, for the Bicycle scene, we obtain 42% reduction during training, requiring only 10G of GPU RAM compared to the 17.4G by 3D-GS. This allows training on such high resolution scenes even with standard consumer-grade GPUs (e.g., NVIDIA RTX 2080 Ti).