PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image


1HKUST

2Tongyi Lab, Alibaba Group

3Fudan University

* Equal Contribution     † Corresponding Author    
Alternative Text

PanoLAM creates high-fidelity Gaussian full-heads with one-shot unposed images within one second.

Abstract

We present a feed-forward framework for Gaussian full-head synthesis from a single unposed image. Unlike previous work that relies on time-consuming GAN inversion and test-time optimization, our framework can reconstruct the Gaussian full-head model given a single unposed image in a single forward pass. This enables fast reconstruction and rendering during inference. To mitigate the lack of large-scale 3D head assets, we propose a large-scale synthetic dataset from trained 3D GANs and train our framework using only synthetic data. For efficient high-fidelity generation, we introduce a coarse-to-fine Gaussian head generation pipeline, where sparse points from the FLAME model interact with the image features by transformer blocks for feature extraction and coarse shape reconstruction, which are then densified for high-fidelity reconstruction. To fully leverage the prior knowledge residing in pretrained 3D GANs for effective reconstruction, we propose a dual-branch framework that effectively aggregates the structured spherical triplane feature and unstructured point-based features for more effective Gaussian head reconstruction. Experimental results show the effectiveness of our framework towards existing work.

Method

Alternative Text

Given an unposed head image as input, PanoLAM involves two branches to achieve single-pass 3D Gaussian head reconstruction: a point-based transformer for coarse-to-fine point shape reconstruction and point features extraction, and a spherical triplane transformer to distill prior knowledge from 3D GAN. Features from the two branches are concatenated for high-fidelity Gaussian head regression.

Results

PanoLAM synthesis Gaussian full-head from one-shot unposed image, enabling 360° free-view rendering.

Alternative Text