PyGS : Large-scale Scene Representation with Pyramidal 3D Gaussian Splatting

Department of Computer Science and Engineering
The Hong Kong University of Science and Technology (HKUST)


Demo Video

Abstract

Neural Radiance Fields (NeRFs) have demonstrated remarkable proficiency in synthesizing photorealistic images of large-scale scenes. However, they are often plagued by a loss of fine details and long rendering durations. 3D Gaussian Splatting has recently been introduced as a potent alternative, achieving both high-fidelity visual results and accelerated rendering performance. Nonetheless, scaling 3D Gaussian Splatting is fraught with challenges. Specifically, large-scale scenes grapples with the integration of objects across multiple scales and disparate viewpoints, which often leads to compromised efficacy as the Gaussians need to balance between detail levels. Furthermore, the generation of initialization points via COLMAP from large-scale dataset is both computationally demanding and prone to incomplete reconstructions. To address these challenges, we present Pyramidal 3D Gaussian Splatting (PyGS) with NeRF Initialization. Our approach represent the scene with a hierarchical assembly of Gaussians arranged in a pyramidal fashion. The top level of the pyramid is composed of a few large Gaussians, while each subsequent layer accommodates a denser collection of smaller Gaussians. We effectively initialize these pyramidal Gaussians through sampling a rapidly trained grid-based NeRF at various frequencies. We group these pyramidal Gaussians into clusters and use a compact weighting network to dynamically determine the influence of each pyramid level of each cluster considering camera viewpoint during rendering. Our method achieves a significant performance leap across multiple large-scale datasets and attains a rendering time over 400 times faster than current SoTA approaches.

Method

PyGS parameterizes the scene using a hierarchical structure of 3D Gaussians, organized into pyramid levels that represent different levels of details. The top level of the pyramid comprises a few large Gaussians that encapsulate general scene shape and structure, and the lower levels contain more smaller Gaussians that model finer scene details. We subsequently sample the grid-based NeRF to form a dense point cloud and generate several subsets of point clouds by sampling the dense one at multiple frequencies. These subsets of points serve as the initialization points for the respective levels of the proposed pyramidal Gaussian structure. When rendering an image, it is desirable to adaptively choose the contribution of each level based both on the camera viewpoint and the complexity of the rendered region. For example, when the camera is distant or the region is smooth and textureless, the higher-level Gaussians (i.e., fewer and larger Gaussians) should be prioritized. Conversely, when the camera is in close proximity or the region has complex geometry or textures, the lower levels (i.e., more smaller Gaussians) should come into play. To more effectively capture the nuances of local geometry and texture, we introduce a learnable embedding for each cluster. This embedding, along with the camera viewpoint, is input into the weighting network, which then computes the weights for the different pyramid levels associated with that cluster. Consequently, Gaussians belonging to the same level and cluster are assigned the same weight. Additionally, we integrate an appearance embedding for each cluster and a color correction network to account for the intricate variations in lighting that can occur across viewpoints.

Results

Results on Mill19 dataset.
Results on Urbanscene dataset.
Results on MatrixCity dataset.
Results on BungeeNeRF dataset.

Examples

Ours
3DGS
Ours
MegaNeRF
Ours
GridNeRF
Ours
3DGS
Ours
SwitchNeRF
Ours
BungeeNeRF

Longer Fly-through

BibTeX

@misc{wang2024pygs,
        title={PyGS: Large-scale Scene Representation with Pyramidal 3D Gaussian Splatting}, 
        author={Zipeng Wang and Dan Xu},
        year={2024},
        eprint={2405.16829},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
}