Disentangled text-driven stylization of 3D faces via directional CLIP losses

被引:0
作者
Gao, Wenjing [1 ]
Li, Xi [1 ]
Liu, Chang [2 ,3 ]
Wang, Jiaojiao [2 ,3 ]
Yu, Dingguo [2 ,3 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310013, Peoples R China
[2] Commun Univ Zhejiang, Coll Media Engn, Hangzhou 310018, Peoples R China
[3] Commun Univ Zhejiang, Key Lab Film & TV Media Technol Zhejiang Prov, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金;
关键词
Text-driven 3D face stylization; Geometry deformation; Texture transformation; CLIP model;
D O I
10.1007/s00371-025-04047-9
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
3D face stylization remains challenging due to limited training samples, diverse style domains, and the complex mapping between ambiguous style features and 3D face structures. To address these issues, we propose ClipStyleFace, a text-driven approach for 3D face stylization that leverages CLIP (Contrastive Language-Image Pre-training) knowledge to create style variations in both geometric and texture structures. ClipStyleFace comprises three components. For geometry deformation, a deformable surface is designed to model stylized geometric residuals on the initial mesh. For texture transformation, we construct a compact parameter space enabling style transfer using a pre-trained albedo generator. Both modules are optimized consistently by distilling semantic alignment and domain correction knowledge from the CLIP model. Extensive experiments demonstrate the effectiveness of our approach in generating stylized 3D faces that match target style prompts while preserving identity characteristics and facial details. Our model also holds promise for applications such as animation and image-driven 3D stylized face generation. Our code is released on https://github.com/cutegao715/ClipStyleFace.
引用
收藏
页数:16
相关论文
共 57 条
[31]   Magic3D: High-Resolution Text-to-3D Content Creation [J].
Lin, Chen-Hsuan ;
Gao, Jun ;
Tang, Luming ;
Takikawa, Towaki ;
Zeng, Xiaohui ;
Huang, Xun ;
Kreis, Karsten ;
Fidler, Sanja ;
Liu, Ming-Yu ;
Lin, Tsung-Yi .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :300-309
[32]   Free editing of Shape and Texture with Deformable Net for 3D Caricature Generation [J].
Lin, Yuanyuan ;
Dai, Ju ;
Pan, Junjun ;
Zhou, Feng ;
Bai, Junxuan .
VISUAL COMPUTER, 2024, 40 (07) :4675-4687
[33]   X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance [J].
Ma, Yiwei ;
Zhang, Xiaoqing ;
Sun, Xiaoshuai ;
Ji, Jiayi ;
Wang, Haowei ;
Jiang, Guannan ;
Zhuang, Weilin ;
Ji, Rongrong .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, :2737-2748
[34]   Text2Mesh: Text-Driven Neural Stylization for Meshes [J].
Michel, Oscar ;
Bar-On, Roi ;
Liu, Richard ;
Benaim, Sagie ;
Hanocka, Rana .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13482-13492
[35]  
Pinkney JNM, 2020, Arxiv, DOI arXiv:2010.05334
[36]  
Pan CY, 2024, VIRTUAL REAL INTEL H, V6, P292, DOI [10.1016/j.vrih.2023.06.010, 10.1016/j.vrih.2023.06.010]
[37]   Real-time Facial Animation for 3D Stylized Character with Emotion Dynamics [J].
Pan, Ye ;
Zhang, Ruisi ;
Wang, Jingying ;
Ding, Yu ;
Mitchell, Kenny .
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :6851-6859
[38]  
Poole Ben., 2022, arXiv
[39]  
Radford A, 2021, PR MACH LEARN RES, V139
[40]  
Rai A., 2024, P IEEE CVF WINT C AP, P3738