Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

Xiaobing.AI

Portrait4D-v2 takes a source image (left) as input and synthesizes its lifelike 4D head avatar (middle) given another driving video (right) for reenactment.

Abstract

In this paper, we propose a novel learning approach for feed-forward one-shot 4D head avatar synthesis. Different from existing methods that often learn from reconstructing monocular videos guided by 3DMM, we employ pseudo multi-view videos to learn a 4D head synthesizer in a data-driven manner, avoiding reliance on inaccurate 3DMM reconstruction that could be detrimental to the synthesis performance. The key idea is to first learn a 3D head synthesizer using synthetic multi-view images to convert monocular real videos into multi-view ones, and then utilize the pseudo multi-view videos to learn a 4D head synthesizer via cross-view self-reenactment. By leveraging a simple vision transformer backbone with motion-aware cross-attentions, our method exhibits superior performance compared to previous methods in terms of reconstruction fidelity, geometry consistency, and motion control accuracy. We hope our method offers novel insights into integrating 3D priors with 2D supervisions for improved 4D head avatar creation.

Video


Framework

Architecture of the 4D head reconstruction model.

Overview of our approach. Given a monocular video sampled from the training set, we first leverage a pre-trained 3D synthesizer Ψ3d to turn each driving frame within the video into multi-view one, and then use the pseudo multi-view driving frames and a source frame sampled from the original video to perform cross-view self-reenactment for learning a feed-forward 4D head synthesizer Ψ. After training, Ψ can synthesize an animatable 3D head given two arbitrary images to provide the source appearance and driving motion, respectively.


Results


Talking Head Synthesis

Our method can synthesize vivid 4D talking heads via video-based reenactment. It faithfully reconstructs the source appearance meanwhile mimics the nuanced expressions in different driving videos.



Free View Rendering

Our method supports free-view rendering of the head avatars thanks to the underlying 3D representation. Use the slider below to linearly change the camera viewpoint.

Soure image.

Soure Image

Loading...
Driving image.

Driving Image

Soure image.

Soure Image

Loading...
Driving image.

Driving Image

Soure image.

Soure Image

Loading...
Driving image.

Driving Image

Soure image.

Soure Image

Loading...
Driving image.

Driving Image

Soure image.

Soure Image

Loading...
Driving image.

Driving Image

Soure image.

Soure Image

Loading...
Driving image.

Driving Image


BibTeX

@article{deng2024portrait4dv2,
  title     = {Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer},
  author    = {Deng, Yu and Wang, Duomin and Wang, baoyuan},
  journal   = {arXiv},
  year      = {2024},
}