AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation

被引：0

作者：

Wang, Xinzhou ^{[1
,2
,3
,4
]}

Wang, Yikai ^{[2
]}

Yee, Junliang ^{[2
]}

Sung, Fuchun ^{[2
]}

Wang, Zhengyi ^{[2
,3
]}

Wang, Ling ^{[2
,6
]}

Liu, Pengkun ^{[2
,7
]}

Sung, Kai ^{[2
]}

Wan, Xintong ^{[8
]}

Xie, Wende ^{[5
]}

Liu, Fangfu ^{[2
]}

He, Bin ^{[1
]}

机构：

[1] Tongji Univ, Shanghai, Peoples R China

[2] Tsinghua Univ, Beijing, Peoples R China

[3] ShengShu, Beijing, Peoples R China

[4] Tencent, Shenzhen, Peoples R China

[5] Didi, Beijing, Peoples R China

[6] Xian Res Inst High Tech, Xian, Peoples R China

[7] Fudan Univ, Shanghai, Peoples R China

[8] Zhejiang Univ, Hangzhou, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT XXV | 2025年 / 15083卷

基金：

中国博士后科学基金; 美国国家科学基金会; 中国国家自然科学基金;

关键词：

4D generation; Diffusion model; Non-rigid reconstruction;

D O I：

10.1007/978-3-031-72698-9_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Advances in 3D generation have facilitated sequential 3D model generation (a.k.a 4D generation), yet its application for animatable objects with large motion remains scarce. Our work proposes AnimatableDreamer, a text-to-4D generation framework capable of generating diverse categories of non-rigid objects on skeletons extracted from a monocular video. At its core, AnimatableDreamer is equipped with our novel optimization design dubbed Canonical Score Distillation (CSD), which lifts 2D diffusion for temporal consistent 4D generation. CSD, designed from a score gradient perspective, generates a canonical model with warp-robustness across different articulations. Notably, it also enhances the authenticity of bones and skinning by integrating inductive priors from a diffusion model. Furthermore, with multi-view distillation, CSD infers invisible regions, thereby improving the fidelity of monocular non-rigid reconstruction. Extensive experiments demonstrate the capability of our method in generating high-flexibility text-guided 3D models from the monocular video, while also showing improved reconstruction performance over existing non-rigid reconstruction methods. Project page https://zz7379.github.io/AnimatableDreamer/.

引用

页码：321 / 339

页数：19

共 2 条

[1] DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction
He, Xiaoxiao
Tan, Chaowei
Han, Ligong
Liu, Bo
Axel, Leon
Li, Kang
Metaxas, Dimitris N.
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 132 - 142
[2] 3D Non-Rigid Alignment of Low-Dose Scans Allows to Correct for Saturation in Lower Extremity Cone-Beam CT
Maier, Jennifer
Maier, Andreas
Eskofier, Bjoern
Fahrig, Rebecca
Choi, Jang-Hwan
IEEE ACCESS, 2021, 9 : 71821 - 71831

← 1 →