MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis

被引:69
作者
Dabral, Rishabh [1 ]
Mughal, Muhammad Hamza [1 ,2 ]
Golyanik, Vladislav [1 ]
Theobalt, Christian [1 ]
机构
[1] Max Planck Inst Informat, SIC, Saarbrucken, Germany
[2] Saarland Univ, Saarbrucken, Germany
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.00941
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional methods for human motion synthesis have either been deterministic or have had to struggle with the trade-off between motion diversity vs motion quality. In response to these limitations, we introduce MoFusion, i.e., a new denoising-diffusion-based framework for high-quality conditional human motion synthesis that can synthesise long, temporally plausible, and semantically accurate motions based on a range of conditioning contexts (such as music and text). We also present ways to introduce well-known kinematic losses for motion plausibility within the motion-diffusion framework through our scheduled weighting strategy. The learned latent space can be used for several interactive motion-editing applications like in-betweening, seed-conditioning, and text-based editing, thus, providing crucial abilities for virtual-character animation and robotics. Through comprehensive quantitative evaluations and a perceptual user study, we demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature. We urge the reader to watch our supplementary video at https://vcai.mpi-inf.mpg.de/projects/MoFusion/.
引用
收藏
页码:9760 / 9770
页数:11
相关论文
共 73 条
[1]  
Ahuja Chaitanya, 2019, 3DV
[2]  
Alexanderson Simon, 2020, COMPUTER GRAPHICS FO
[3]  
Aliakbarian Sadegh, 2022, CVPR
[4]  
[Anonymous], 2015, Deep unsupervised learning using nonequilibrium thermodynamics
[5]  
Aristidou Andreas, 2022, IEEE Transactions on Visualization and Computer Graphics (TVCG)
[6]  
Bhattacharya Uttaran, 2007, TEXT2GESTURES TRANSF
[7]  
Bowden Richard, 2000, P CVPR IEEE WORKSH H
[8]  
Croitoru Florinel-Alin, 2022, DIFFUSION MODELS VIS
[9]   Learning 3D Human Pose from Structure and Motion [J].
Dabral, Rishabh ;
Mundhada, Anurag ;
Kusupati, Uday ;
Afaque, Safeer ;
Sharma, Abhishek ;
Jain, Arjun .
COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :679-696
[10]  
Dhariwal P, 2021, ADV NEUR IN, V34