AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation

被引:0
|
作者
Wang, Xinzhou [1 ,2 ,3 ,4 ]
Wang, Yikai [2 ]
Yee, Junliang [2 ]
Sung, Fuchun [2 ]
Wang, Zhengyi [2 ,3 ]
Wang, Ling [2 ,6 ]
Liu, Pengkun [2 ,7 ]
Sung, Kai [2 ]
Wan, Xintong [8 ]
Xie, Wende [5 ]
Liu, Fangfu [2 ]
He, Bin [1 ]
机构
[1] Tongji Univ, Shanghai, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
[3] ShengShu, Beijing, Peoples R China
[4] Tencent, Shenzhen, Peoples R China
[5] Didi, Beijing, Peoples R China
[6] Xian Res Inst High Tech, Xian, Peoples R China
[7] Fudan Univ, Shanghai, Peoples R China
[8] Zhejiang Univ, Hangzhou, Peoples R China
来源
COMPUTER VISION - ECCV 2024, PT XXV | 2025年 / 15083卷
基金
中国博士后科学基金; 美国国家科学基金会; 中国国家自然科学基金;
关键词
4D generation; Diffusion model; Non-rigid reconstruction;
D O I
10.1007/978-3-031-72698-9_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Advances in 3D generation have facilitated sequential 3D model generation (a.k.a 4D generation), yet its application for animatable objects with large motion remains scarce. Our work proposes AnimatableDreamer, a text-to-4D generation framework capable of generating diverse categories of non-rigid objects on skeletons extracted from a monocular video. At its core, AnimatableDreamer is equipped with our novel optimization design dubbed Canonical Score Distillation (CSD), which lifts 2D diffusion for temporal consistent 4D generation. CSD, designed from a score gradient perspective, generates a canonical model with warp-robustness across different articulations. Notably, it also enhances the authenticity of bones and skinning by integrating inductive priors from a diffusion model. Furthermore, with multi-view distillation, CSD infers invisible regions, thereby improving the fidelity of monocular non-rigid reconstruction. Extensive experiments demonstrate the capability of our method in generating high-flexibility text-guided 3D models from the monocular video, while also showing improved reconstruction performance over existing non-rigid reconstruction methods. Project page https://zz7379.github.io/AnimatableDreamer/.
引用
收藏
页码:321 / 339
页数:19
相关论文
共 2 条
  • [1] DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction
    He, Xiaoxiao
    Tan, Chaowei
    Han, Ligong
    Liu, Bo
    Axel, Leon
    Li, Kang
    Metaxas, Dimitris N.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 132 - 142
  • [2] 3D Non-Rigid Alignment of Low-Dose Scans Allows to Correct for Saturation in Lower Extremity Cone-Beam CT
    Maier, Jennifer
    Maier, Andreas
    Eskofier, Bjoern
    Fahrig, Rebecca
    Choi, Jang-Hwan
    IEEE ACCESS, 2021, 9 : 71821 - 71831