Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

被引:1
作者
Wu, Yiqian [1 ]
Xu, Hao [1 ]
Tang, Xiangjun [1 ]
Chen, Xien [2 ]
Tang, Siyu [3 ]
Zhang, Zhebin [4 ]
Li, Chen [4 ]
Jin, Xiaogang [1 ]
机构
[1] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou, Peoples R China
[2] Yale Univ, New Haven, CT USA
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] OPPO US Res Ctr, Menlo Pk, CA USA
来源
ACM TRANSACTIONS ON GRAPHICS | 2024年 / 43卷 / 04期
基金
中国国家自然科学基金;
关键词
3D portrait generation; 3D-aware GANs; diffusion models;
D O I
10.1145/3658162
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Existing neural rendering-based text-to-3D-portrait generation methods typically make use of human geometry prior and diffusion models to obtain guidance. However, relying solely on geometry information introduces issues such as the Janus problem, over-saturation, and over-smoothing. We present Portrait3D, a novel neural rendering-based framework with a novel joint geometry-appearance prior to achieve text-to-3D-portrait generation that overcomes the aforementioned issues. To accomplish this, we train a 3D portrait generator, 3DPortraitGAN(sic), as a robust prior. This generator is capable of producing 360 degrees. canonical 3D portraits, serving as a starting point for the subsequent diffusion-based generation process. To mitigate the "grid-like" artifact caused by the high-frequency information in the featuremap-based 3D representation commonly used by most 3D-aware GANs, we integrate a novel pyramid tri-grid 3D representation into 3DPortraitGAN(sic). To generate 3D portraits from text, we first project a randomly generated image aligned with the given prompt into the pre-trained 3DPortraitGAN(sic) 's latent space. The resulting latent code is then used to synthesize a pyramid tri-grid. Beginning with the obtained pyramid tri-grid, we use score distillation sampling to distill the diffusion model's knowledge into the pyramid tri-grid. Following that, we utilize the diffusion model to refine the rendered images of the 3D portrait and then use these refined images as training data to further optimize the pyramid tri-grid, effectively eliminating issues with unrealistic color and unnatural artifacts. Our experimental results show that Portrait3D can produce realistic, high-quality, and canonical 3D portraits that align with the prompt.
引用
收藏
页数:12
相关论文
共 61 条
  • [1] Single-Image 3D Human Digitization with Shape-Guided Diffusion
    AlBahar, Badour
    Saito, Shunsuke
    Tseng, Hung-Yu
    Kim, Changil
    Kopf, Johannes
    Huang, Jia-Bin
    [J]. PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [2] imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose
    Alldieck, Thiemo
    Xu, Hongyi
    Sminchisescu, Cristian
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5441 - 5450
  • [3] PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360°
    An, Sizhe
    Xu, Hongyi
    Shi, Yichun
    Song, Guoxian
    Ogras, Umit Y.
    Luo, Linjie
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 20950 - 20959
  • [4] Efficient Geometry-aware 3D Generative Adversarial Networks
    Chan, Eric R.
    Lin, Connor Z.
    Chan, Matthew A.
    Nagano, Koki
    Pan, Boxiao
    de Mello, Shalini
    Gallo, Orazio
    Guibas, Leonidas
    Tremblay, Jonathan
    Khamis, Sameh
    Karras, Tero
    Wetzstein, Gordon
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16102 - 16112
  • [5] pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
    Chan, Eric R.
    Monteiro, Marco
    Kellnhofer, Petr
    Wu, Jiajun
    Wetzstein, Gordon
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5795 - 5805
  • [6] Mimic3D: Thriving 3D-Aware GANs via 3D-to-2D Imitation
    Chen, Xingyu
    Deng, Yu
    Wang, Baoyuan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2338 - 2348
  • [7] Chen YF, 2023, Arxiv, DOI arXiv:2312.04558
  • [8] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
  • [9] Gu Jiatao, 2022, 10 INT C LEARN REPR
  • [10] DensePose: Dense Human Pose Estimation In The Wild
    Guler, Riza Alp
    Neverova, Natalia
    Kokkinos, Lasonas
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7297 - 7306