MotionDiffuse: Text-Driven Human Motion Generation With Diffusion Model

被引：28

作者：

Zhang, Mingyuan ^{[1
]}

Cai, Zhongang ^{[1
,2
,3
]}

Pan, Liang ^{[1
]}

Hong, Fangzhou ^{[1
]}

Guo, Xinying ^{[1
]}

Yang, Lei ^{[2
,3
]}

Liu, Ziwei ^{[1
]}

机构：

[1] Nanyang Technol Univ, S Lab, Singapore 639798, Singapore

[2] SenseTime Res, Shenzhen 518100, Peoples R China

[3] Shanghai AI Lab, Shenzhen 518100, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 06期

关键词：

Pipelines; Task analysis; Noise reduction; Transformers; Training; Probabilistic logic; Decoding; Conditional motion generation; diffusion model; motion synthesis; text-driven generation;

D O I：

10.1109/TPAMI.2024.3355414

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging to achieve diverse and fine-grained motion generation with various text inputs. To address this problem, we propose <bold>MotionDiffuse</bold>, one of the first diffusion model-based text-driven motion generation frameworks, which demonstrates several desired properties over existing methods. 1) Probabilistic Mapping. Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected. 2) Realistic Synthesis. MotionDiffuse excels at modeling complicated data distribution and generating vivid motion sequences. 3) Multi-Level Manipulation. MotionDiffuse responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts. Our experiments show MotionDiffuse outperforms existing SoTA methods by convincing margins on text-driven motion generation and action-conditioned motion generation. A qualitative analysis further demonstrates MotionDiffuse's controllability for comprehensive motion generation.

引用

页码：4115 / 4128

页数：14

共 85 条

[1] Language2Pose: Natural Language Grounded Pose Forecasting
Ahuja, Chaitanya
Morency, Louis-Philippe
[J]. 2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 719 - 728
[2] A Stochastic Conditioning Scheme for Diverse Human Motion Prediction
Aliakbarian, Sadegh
Saleh, Fatemeh Sadat
Salzmann, Mathieu
Petersson, Lars
Gould, Stephen
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5222 - 5231
[3] PoseTrack: A Benchmark for Human Pose Estimation and Tracking
Andriluka, Mykhaylo
Iqbal, Umar
Insafutdinov, Eldar
Pishchulin, Leonid
Milan, Anton
Gall, Juergen
Schiele, Bernt
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5167 - 5176
[4] TEACH: Temporal Action Composition for 3D Humans
Athanasiou, Nikos
Petrovich, Mathis
Black, Michael J.
Varol, Gul
[J]. 2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 414 - 423
[5] HP-GAN: Probabilistic 3D human motion prediction via GAN
Barsoum, Emad
Kender, John
Liu, Zicheng
[J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 1499 - 1508
[6] Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents
Bhattacharya, Uttaran
Rewkowski, Nicholas
Banerjee, Abhishek
Guhan, Pooja
Bera, Aniket
Manocha, Dinesh
[J]. 2021 IEEE VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2021, : 160 - 169
[7] HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and Modeling
Cai, Zhongang
Ren, Daxuan
Zeng, Ailing
Lin, Zhengyu
Yu, Tao
Wang, Wenjia
Fan, Xiangyu
Gao, Yang
Yu, Yifan
Pan, Liang
Hong, Fangzhou
Zhang, Mingyuan
Loy, Chen Change
Yang, Lei
Liu, Ziwei
[J]. COMPUTER VISION, ECCV 2022, PT VII, 2022, 13667 : 557 - 577
[8] Cai ZA, 2024, Arxiv, DOI arXiv:2110.07588
[9] Cao Z., 2020, COMPUTER VISION ECCV
[10] Carreira J, 2019, Arxiv, DOI [arXiv:1907.06987, 10.48550/arXiv.1907.06987, DOI 10.48550/ARXIV.1907.06987]

← 1 2 3 4 5 6 7 8 9 →