Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

被引：1

作者：

Wu, Yiqian ^{[1
]}

Xu, Hao ^{[1
]}

Tang, Xiangjun ^{[1
]}

Chen, Xien ^{[2
]}

Tang, Siyu ^{[3
]}

Zhang, Zhebin ^{[4
]}

Li, Chen ^{[4
]}

Jin, Xiaogang ^{[1
]}

机构：

[1] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou, Peoples R China

[2] Yale Univ, New Haven, CT USA

[3] Swiss Fed Inst Technol, Zurich, Switzerland

[4] OPPO US Res Ctr, Menlo Pk, CA USA

来源：

ACM TRANSACTIONS ON GRAPHICS | 2024年 / 43卷 / 04期

基金：

中国国家自然科学基金;

关键词：

3D portrait generation; 3D-aware GANs; diffusion models;

D O I：

10.1145/3658162

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Existing neural rendering-based text-to-3D-portrait generation methods typically make use of human geometry prior and diffusion models to obtain guidance. However, relying solely on geometry information introduces issues such as the Janus problem, over-saturation, and over-smoothing. We present Portrait3D, a novel neural rendering-based framework with a novel joint geometry-appearance prior to achieve text-to-3D-portrait generation that overcomes the aforementioned issues. To accomplish this, we train a 3D portrait generator, 3DPortraitGAN(sic), as a robust prior. This generator is capable of producing 360 degrees. canonical 3D portraits, serving as a starting point for the subsequent diffusion-based generation process. To mitigate the "grid-like" artifact caused by the high-frequency information in the featuremap-based 3D representation commonly used by most 3D-aware GANs, we integrate a novel pyramid tri-grid 3D representation into 3DPortraitGAN(sic). To generate 3D portraits from text, we first project a randomly generated image aligned with the given prompt into the pre-trained 3DPortraitGAN(sic) 's latent space. The resulting latent code is then used to synthesize a pyramid tri-grid. Beginning with the obtained pyramid tri-grid, we use score distillation sampling to distill the diffusion model's knowledge into the pyramid tri-grid. Following that, we utilize the diffusion model to refine the rendered images of the 3D portrait and then use these refined images as training data to further optimize the pyramid tri-grid, effectively eliminating issues with unrealistic color and unnatural artifacts. Our experimental results show that Portrait3D can produce realistic, high-quality, and canonical 3D portraits that align with the prompt.

引用

页数：12

共 61 条

[1] Single-Image 3D Human Digitization with Shape-Guided Diffusion
AlBahar, Badour
Saito, Shunsuke
Tseng, Hung-Yu
Kim, Changil
Kopf, Johannes
Huang, Jia-Bin
[J]. PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
[2] imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose
Alldieck, Thiemo
Xu, Hongyi
Sminchisescu, Cristian
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5441 - 5450
[3] PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360°
An, Sizhe
Xu, Hongyi
Shi, Yichun
Song, Guoxian
Ogras, Umit Y.
Luo, Linjie
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 20950 - 20959
[4] Efficient Geometry-aware 3D Generative Adversarial Networks
Chan, Eric R.
Lin, Connor Z.
Chan, Matthew A.
Nagano, Koki
Pan, Boxiao
de Mello, Shalini
Gallo, Orazio
Guibas, Leonidas
Tremblay, Jonathan
Khamis, Sameh
Karras, Tero
Wetzstein, Gordon
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16102 - 16112
[5] pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
Chan, Eric R.
Monteiro, Marco
Kellnhofer, Petr
Wu, Jiajun
Wetzstein, Gordon
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5795 - 5805
[6] Mimic3D: Thriving 3D-Aware GANs via 3D-to-2D Imitation
Chen, Xingyu
Deng, Yu
Wang, Baoyuan
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2338 - 2348
[7] Chen YF, 2023, Arxiv, DOI arXiv:2312.04558
[8] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[9] Gu Jiatao, 2022, 10 INT C LEARN REPR
[10] DensePose: Dense Human Pose Estimation In The Wild
Guler, Riza Alp
Neverova, Natalia
Kokkinos, Lasonas
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7297 - 7306

← 1 2 3 4 5 6 7 →