TECA: Text-Guided Generation and Editing of Compositional 3D Avatars

被引:0
作者
Zhang, Hao [1 ,3 ,4 ]
Feng, Yao [1 ,2 ]
Kulits, Peter [1 ]
Wen, Yandong [1 ]
Thies, Justus [1 ]
Black, Michael J. [1 ]
机构
[1] Max Planck Inst Intelligent Syst, Stuttgart, Germany
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] Tsinghua Univ, Beijing, Peoples R China
[4] Rhein Westfal TH Aachen, Aachen, Germany
来源
2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024 | 2024年
关键词
D O I
10.1109/3DV62453.2024.00151
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our goal is to create a realistic 3D facial avatar with hair and accessories using only a text description. While this challenge has attracted significant recent interest, existing methods either lack realism, produce unrealistic shapes, or do not support editing, such as modifications to the hairstyle. We argue that existing methods are limited because they employ a monolithic modeling approach, using a single representation for the head, face, hair, and accessories. Our observation is that the hair and face, for example, have very different structural qualities that benefit from different representations. Building on this insight, we generate avatars with a compositional model, in which the head, face, and upper body are represented with traditional 3D meshes, and the hair, clothing, and accessories with neural radiance fields (NeRF). The model-based mesh representation provides a strong geometric prior for the face region, improving realism while enabling editing of the person's appearance. By using NeRFs to represent the remaining components, our method is able to model and synthesize parts with complex geometry and appearance, such as curly hair and fluffy scarves. Our novel system synthesizes these high-quality compositional avatars from text descriptions. Specifically, we generate a face image using text, fit a parametric shape model to it, and inpaint texture using diffusion models. Conditioned on the generated face, we sequentially generate style components such as hair or clothing using Score Distillation Sampling (SDS) with guidance from CLIPSeg segmentations. However, this alone is not sufficient to produce avatars with a high degree of realism. Consequently, we introduce a hierarchical approach to refine the non-face regions using a BLIP-based loss combined with SDS. The experimental results demonstrate that our method, Text-guided generation and Editing of Compositional Avatars (TECA), produces avatars that are more realistic than those of recent methods while being editable because of their compositional nature. For example, our TECA enables the seamless transfer of compositional features like hairstyles, scarves, and other accessories between avatars. This capability supports applications such as virtual try-on. The code and generated avatars will be publicly available for research purposes at yfeng95.github.io/teca.
引用
收藏
页码:1520 / 1530
页数:11
相关论文
共 71 条
[31]  
Li JN, 2022, PR MACH LEARN RES
[32]  
Li Junxuan, 2023, Megane: Morphable eyeglass and avatar network, P3
[33]   Learning a model of facial shape and expression from 4D scans [J].
Li, Tianye ;
Bolkart, Timo ;
Black, Michael J. ;
Li, Hao ;
Romero, Javier .
ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (06)
[34]   Magic3D: High-Resolution Text-to-3D Content Creation [J].
Lin, Chen-Hsuan ;
Gao, Jun ;
Tang, Luming ;
Takikawa, Towaki ;
Zeng, Xiaohui ;
Huang, Xun ;
Kreis, Karsten ;
Fidler, Sanja ;
Liu, Ming-Yu ;
Lin, Tsung-Yi .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :300-309
[35]  
Liu W., 2022, ECCV
[36]   Deep Learning Face Attributes in the Wild [J].
Liu, Ziwei ;
Luo, Ping ;
Wang, Xiaogang ;
Tang, Xiaoou .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :3730-3738
[37]   Deep Appearance Models for Face Rendering [J].
Lombardi, Stephen ;
Saragih, Jason ;
Simon, Tomas ;
Sheikh, Yaser .
ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04)
[38]   SMPL: A Skinned Multi-Person Linear Model [J].
Loper, Matthew ;
Mahmood, Naureen ;
Romero, Javier ;
Pons-Moll, Gerard ;
Black, Michael J. .
ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06)
[39]   Image Segmentation Using Text and Image Prompts [J].
Lueddecke, Timo ;
Ecker, Alexander .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :7076-7086
[40]   Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures [J].
Metzer, Gal ;
Richardson, Elad ;
Patashnik, Or ;
Giryes, Raja ;
Cohen-Or, Daniel .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :12663-12673