3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models

被引:5
作者
Yang, Haibo [1 ]
Chen, Yang [2 ]
Pan, Yingwei [2 ]
Yao, Ting [3 ]
Chen, Zhineng [1 ]
Mei, Tao [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] HiDream Ai Inc, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Text-driven 3D Stylization; Diffusion Model; Depth;
D O I
10.1145/3581783.3612363
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community. Recent advances of cross-modal foundation models (e.g., CLIP) have made this problem feasible. Those approaches commonly leverage CLIP to align the holistic semantics of stylized mesh with the given text prompt. Nevertheless, it is not trivial to enable more controllable stylization of fine-grained details in 3D meshes solely based on such semantic-level cross-modal supervision. In this work, we propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models. Technically, 3DStyle-Diffusion first parameterizes the texture of 3D mesh into reflectance properties and scene lighting using implicit MLP networks. Meanwhile, an accurate depth map of each sampled view is achieved conditioned on 3D mesh. Then, 3DStyle-Diffusion leverages a pretrained controllable 2D Diffusion model to guide the learning of rendered images, encouraging the synthesized image of each view semantically aligned with text prompt and geometrically consistent with depth map. This way elegantly integrates both image rendering via implicit MLP networks and diffusion process of image synthesis in an end-to-end fashion, enabling a high-quality fine-grained stylization of 3D meshes. We also build a new dataset derived from Objaverse and the evaluation protocol for this task. Through both qualitative and quantitative experiments, we validate the capability of our 3DStyle-Diffusion. Source code and data are available at https://github.com/yanghb22- fdu/3DStyle- Diffusion-Official.
引用
收藏
页码:6860 / 6868
页数:9
相关论文
共 13 条
  • [1] ControlNeRF: Text-Driven 3D Scene Stylization via Diffusion Model
    Chen, Jiahui
    Yang, Chuanfeng
    Li, Kaiheng
    Wu, Qingqiang
    Hong, Qingqi
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 395 - 406
  • [2] AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars
    Mendiratta, Mohit
    Pan, Xingang
    Elgharib, Mohamed
    Teotia, Kartik
    Mallikarjun, B. R.
    Tewari, Ayush
    Golyanik, Vladislav
    Kortylewski, Adam
    Theobalt, Christian
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (06):
  • [3] Learning Pseudo 3D Guidance for View-Consistent Texturing with 2D Diffusion
    Li, Kehan
    Fan, Yanbo
    Wu, Yang
    Sung, Zhongqian
    Yang, Wei
    Ji, Xiangyang
    Yuan, Li
    Chen, Jie
    COMPUTER VISION - ECCV 2024, PT LXXXVI, 2025, 15144 : 18 - 34
  • [4] Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
    Zhu, Xiaoyu
    Zhou, Hao
    Xing, Pengfei
    Zhao, Long
    Xu, Hao
    Liang, Junwei
    Hauptmann, Alexander
    Liu, Ting
    Gallagher, Andrew
    COMPUTER VISION - ECCV 2024, PT XXIX, 2025, 15087 : 357 - 375
  • [5] 3D Contour Generation based on Diffusion Probabilistic Models
    Wu, Yiqi
    Huang, Xuan
    Song, Kelin
    He, Fazhi
    Zhang, Dejun
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1992 - 1997
  • [6] DiffuseIR: Diffusion Models for Isotropic Reconstruction of 3D Microscopic Images
    Pan, Mingjie
    Gan, Yulu
    Zhou, Fangxu
    Liu, Jiaming
    Zhang, Ying
    Wang, Aimin
    Zhang, Shanghang
    Li, Dawei
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT X, 2023, 14229 : 323 - 332
  • [7] Denoising Diffusion Models for 3D Healthy Brain Tissue Inpainting
    Durrer, Alicia
    Wolleb, Julia
    Bieder, Florentin
    Friedrich, Paul
    Melie-Garcia, Lester
    Pineda, Mario Alberto Ocampo
    Bercea, Cosmin I.
    Hamamci, Ibrahim Ethem
    Wiestler, Benedikt
    Piraud, Marie
    Yaldizli, Oezguer
    Granziera, Cristina
    Menze, Bjoern
    Cattin, Philippe C.
    Kofler, Florian
    DEEP GENERATIVE MODELS, DGM4MICCAI 2024, 2025, 15224 : 87 - 97
  • [8] MMIDM: Generating 3D Gesture from Multimodal Inputs with Diffusion Models
    Ye, Ji
    Liu, Changhong
    Wan, Haocong
    Jiang, Aiwen
    Lei, Zhenchun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VI, 2025, 15036 : 313 - 327
  • [9] DiffESM: Conditional Emulation of Temperature and Precipitation in Earth System Models With 3D Diffusion Models
    Bassetti, Seth
    Hutchinson, Brian
    Tebaldi, Claudia
    Kravitz, Ben
    JOURNAL OF ADVANCES IN MODELING EARTH SYSTEMS, 2024, 16 (10)
  • [10] Importance of Aligning Training Strategy with Evaluation for Diffusion Models in 3D Multiclass Segmentation
    Fu, Yunguan
    Li, Yiwen
    Saeed, Shaheer U.
    Clarkson, Matthew J.
    Hu, Yipeng
    DEEP GENERATIVE MODELS, DGM4MICCAI 2023, 2024, 14533 : 86 - 95