Disentangled text-driven stylization of 3D faces via directional CLIP losses

被引：0

作者：

Gao, Wenjing ^{[1
]}

Li, Xi ^{[1
]}

Liu, Chang ^{[2
,3
]}

Wang, Jiaojiao ^{[2
,3
]}

Yu, Dingguo ^{[2
,3
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310013, Peoples R China

[2] Commun Univ Zhejiang, Coll Media Engn, Hangzhou 310018, Peoples R China

[3] Commun Univ Zhejiang, Key Lab Film & TV Media Technol Zhejiang Prov, Hangzhou 310018, Peoples R China

来源：

VISUAL COMPUTER | 2025年

基金：

中国国家自然科学基金;

关键词：

Text-driven 3D face stylization; Geometry deformation; Texture transformation; CLIP model;

D O I：

10.1007/s00371-025-04047-9

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

3D face stylization remains challenging due to limited training samples, diverse style domains, and the complex mapping between ambiguous style features and 3D face structures. To address these issues, we propose ClipStyleFace, a text-driven approach for 3D face stylization that leverages CLIP (Contrastive Language-Image Pre-training) knowledge to create style variations in both geometric and texture structures. ClipStyleFace comprises three components. For geometry deformation, a deformable surface is designed to model stylized geometric residuals on the initial mesh. For texture transformation, we construct a compact parameter space enabling style transfer using a pre-trained albedo generator. Both modules are optimized consistently by distilling semantic alignment and domain correction knowledge from the CLIP model. Extensive experiments demonstrate the effectiveness of our approach in generating stylized 3D faces that match target style prompts while preserving identity characteristics and facial details. Our model also holds promise for applications such as animation and image-driven 3D stylized face generation. Our code is released on https://github.com/cutegao715/ClipStyleFace.

引用

页数：16

共 57 条

[41] TEXTure: Text-Guided Texturing of 3D Shapes [J].

Richardson, Elad ;

Metzer, Gal ;

Alaluf, Yuval ;

Giryes, Raja ;

Cohen-Or, Daniel .

PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,

[42] Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation [J].

Richardson, Elad ;

Alaluf, Yuval ;

Patashnik, Or ;

Nitzan, Yotam ;

Azar, Yaniv ;

Shapiro, Stav ;

Cohen-Or, Daniel .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :2287-2296

[43]

Salimans T, 2016, ADV NEUR IN, V29

[44]

Simonyan K, 2015, Arxiv, DOI arXiv:1409.1556

[45]

Sitzmann Vincent, 2020, Advances in Neural Information Processing Systems, V33

[46]

Wang C, 2024, Arxiv, DOI arXiv:2404.09540

[47]

Wu Zongze., 2021, arXiv

[48]

Xu Hao, 2024, The Visual Computer

[49] Deep 3D Portrait from a Single Image [J].

Xu, Sicheng ;

Yang, Jiaolong ;

Chen, Dong ;

Wen, Fang ;

Deng, Yu ;

Jia, Yunde ;

Tong, Xin .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :7707-7717

[50] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer [J].

Yang, Shuai ;

Jiang, Liming ;

Liu, Ziwei ;

Loy, Chen Change .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :7683-7692

← 1 2 3 4 5 6 →