3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models

被引：5

作者：

Yang, Haibo ^{[1
]}

Chen, Yang ^{[2
]}

Pan, Yingwei ^{[2
]}

Yao, Ting ^{[3
]}

Chen, Zhineng ^{[1
]}

Mei, Tao ^{[3
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China

[2] Univ Sci & Technol China, Hefei, Peoples R China

[3] HiDream Ai Inc, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Text-driven 3D Stylization; Diffusion Model; Depth;

D O I：

10.1145/3581783.3612363

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community. Recent advances of cross-modal foundation models (e.g., CLIP) have made this problem feasible. Those approaches commonly leverage CLIP to align the holistic semantics of stylized mesh with the given text prompt. Nevertheless, it is not trivial to enable more controllable stylization of fine-grained details in 3D meshes solely based on such semantic-level cross-modal supervision. In this work, we propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models. Technically, 3DStyle-Diffusion first parameterizes the texture of 3D mesh into reflectance properties and scene lighting using implicit MLP networks. Meanwhile, an accurate depth map of each sampled view is achieved conditioned on 3D mesh. Then, 3DStyle-Diffusion leverages a pretrained controllable 2D Diffusion model to guide the learning of rendered images, encouraging the synthesized image of each view semantically aligned with text prompt and geometrically consistent with depth map. This way elegantly integrates both image rendering via implicit MLP networks and diffusion process of image synthesis in an end-to-end fashion, enabling a high-quality fine-grained stylization of 3D meshes. We also build a new dataset derived from Objaverse and the evaluation protocol for this task. Through both qualitative and quantitative experiments, we validate the capability of our 3DStyle-Diffusion. Source code and data are available at https://github.com/yanghb22- fdu/3DStyle- Diffusion-Official.

引用

页码：6860 / 6868

页数：9

共 13 条

[1] ControlNeRF: Text-Driven 3D Scene Stylization via Diffusion Model
Chen, Jiahui
Yang, Chuanfeng
Li, Kaiheng
Wu, Qingqiang
Hong, Qingqi
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 395 - 406
[2] AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars
Mendiratta, Mohit
Pan, Xingang
Elgharib, Mohamed
Teotia, Kartik
Mallikarjun, B. R.
Tewari, Ayush
Golyanik, Vladislav
Kortylewski, Adam
Theobalt, Christian
ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (06):
[3] Learning Pseudo 3D Guidance for View-Consistent Texturing with 2D Diffusion
Li, Kehan
Fan, Yanbo
Wu, Yang
Sung, Zhongqian
Yang, Wei
Ji, Xiangyang
Yuan, Li
Chen, Jie
COMPUTER VISION - ECCV 2024, PT LXXXVI, 2025, 15144 : 18 - 34
[4] Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Zhu, Xiaoyu
Zhou, Hao
Xing, Pengfei
Zhao, Long
Xu, Hao
Liang, Junwei
Hauptmann, Alexander
Liu, Ting
Gallagher, Andrew
COMPUTER VISION - ECCV 2024, PT XXIX, 2025, 15087 : 357 - 375
[5] 3D Contour Generation based on Diffusion Probabilistic Models
Wu, Yiqi
Huang, Xuan
Song, Kelin
He, Fazhi
Zhang, Dejun
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1992 - 1997
[6] DiffuseIR: Diffusion Models for Isotropic Reconstruction of 3D Microscopic Images
Pan, Mingjie
Gan, Yulu
Zhou, Fangxu
Liu, Jiaming
Zhang, Ying
Wang, Aimin
Zhang, Shanghang
Li, Dawei
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT X, 2023, 14229 : 323 - 332
[7] Denoising Diffusion Models for 3D Healthy Brain Tissue Inpainting
Durrer, Alicia
Wolleb, Julia
Bieder, Florentin
Friedrich, Paul
Melie-Garcia, Lester
Pineda, Mario Alberto Ocampo
Bercea, Cosmin I.
Hamamci, Ibrahim Ethem
Wiestler, Benedikt
Piraud, Marie
Yaldizli, Oezguer
Granziera, Cristina
Menze, Bjoern
Cattin, Philippe C.
Kofler, Florian
DEEP GENERATIVE MODELS, DGM4MICCAI 2024, 2025, 15224 : 87 - 97
[8] MMIDM: Generating 3D Gesture from Multimodal Inputs with Diffusion Models
Ye, Ji
Liu, Changhong
Wan, Haocong
Jiang, Aiwen
Lei, Zhenchun
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VI, 2025, 15036 : 313 - 327
[9] DiffESM: Conditional Emulation of Temperature and Precipitation in Earth System Models With 3D Diffusion Models
Bassetti, Seth
Hutchinson, Brian
Tebaldi, Claudia
Kravitz, Ben
JOURNAL OF ADVANCES IN MODELING EARTH SYSTEMS, 2024, 16 (10)
[10] Importance of Aligning Training Strategy with Evaluation for Diffusion Models in 3D Multiclass Segmentation
Fu, Yunguan
Li, Yiwen
Saeed, Shaheer U.
Clarkson, Matthew J.
Hu, Yipeng
DEEP GENERATIVE MODELS, DGM4MICCAI 2023, 2024, 14533 : 86 - 95

← 1 2 →