Disentangled text-driven stylization of 3D faces via directional CLIP losses

被引:0
作者
Gao, Wenjing [1 ]
Li, Xi [1 ]
Liu, Chang [2 ,3 ]
Wang, Jiaojiao [2 ,3 ]
Yu, Dingguo [2 ,3 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310013, Peoples R China
[2] Commun Univ Zhejiang, Coll Media Engn, Hangzhou 310018, Peoples R China
[3] Commun Univ Zhejiang, Key Lab Film & TV Media Technol Zhejiang Prov, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金;
关键词
Text-driven 3D face stylization; Geometry deformation; Texture transformation; CLIP model;
D O I
10.1007/s00371-025-04047-9
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
3D face stylization remains challenging due to limited training samples, diverse style domains, and the complex mapping between ambiguous style features and 3D face structures. To address these issues, we propose ClipStyleFace, a text-driven approach for 3D face stylization that leverages CLIP (Contrastive Language-Image Pre-training) knowledge to create style variations in both geometric and texture structures. ClipStyleFace comprises three components. For geometry deformation, a deformable surface is designed to model stylized geometric residuals on the initial mesh. For texture transformation, we construct a compact parameter space enabling style transfer using a pre-trained albedo generator. Both modules are optimized consistently by distilling semantic alignment and domain correction knowledge from the CLIP model. Extensive experiments demonstrate the effectiveness of our approach in generating stylized 3D faces that match target style prompts while preserving identity characteristics and facial details. Our model also holds promise for applications such as animation and image-driven 3D stylized face generation. Our code is released on https://github.com/cutegao715/ClipStyleFace.
引用
收藏
页数:16
相关论文
共 57 条
[1]  
Abdal Rameen, 2022, SIGGRAPH22 Conference Proceeding: Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings, DOI 10.1145/3528233.3530747
[2]   StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation [J].
Alanov, Aibek ;
Titov, Vadim ;
Nakhodnov, Maksim ;
Vetrov, Dmitry .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, :2184-2194
[3]  
Alanov Aibek, 2022, ADV NEUR IN
[4]   ClipFace: Text-guided Editing of Textured 3D Morphable Models [J].
Aneja, Shivangi ;
Thies, Justus ;
Dai, Angela ;
Niessner, Matthias .
PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,
[5]  
Bińkowski M, 2021, Arxiv, DOI arXiv:1801.01401
[6]   A morphable model for the synthesis of 3D faces [J].
Blanz, V ;
Vetter, T .
SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, :187-194
[7]   Text and Image Guided 3D Avatar Generation and Manipulation [J].
Canfes, Zehranaz ;
Atasoy, M. Furkan ;
Dirik, Alara ;
Yanardag, Pinar .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :4410-4420
[8]   Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation [J].
Chen, Rui ;
Chen, Yongwei ;
Jiao, Ningxin ;
Jia, Kui .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :22189-22199
[9]  
Chen Y., 2022, Adv. Neural. Inf. Process. Syst, V35, P30923
[10]  
Chen Z., 2024, IEEE T CIRCUITS SYST