MotionDiffuse: Text-Driven Human Motion Generation With Diffusion Model

被引:28
作者
Zhang, Mingyuan [1 ]
Cai, Zhongang [1 ,2 ,3 ]
Pan, Liang [1 ]
Hong, Fangzhou [1 ]
Guo, Xinying [1 ]
Yang, Lei [2 ,3 ]
Liu, Ziwei [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore 639798, Singapore
[2] SenseTime Res, Shenzhen 518100, Peoples R China
[3] Shanghai AI Lab, Shenzhen 518100, Peoples R China
关键词
Pipelines; Task analysis; Noise reduction; Transformers; Training; Probabilistic logic; Decoding; Conditional motion generation; diffusion model; motion synthesis; text-driven generation;
D O I
10.1109/TPAMI.2024.3355414
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging to achieve diverse and fine-grained motion generation with various text inputs. To address this problem, we propose <bold>MotionDiffuse</bold>, one of the first diffusion model-based text-driven motion generation frameworks, which demonstrates several desired properties over existing methods. 1) Probabilistic Mapping. Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected. 2) Realistic Synthesis. MotionDiffuse excels at modeling complicated data distribution and generating vivid motion sequences. 3) Multi-Level Manipulation. MotionDiffuse responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts. Our experiments show MotionDiffuse outperforms existing SoTA methods by convincing margins on text-driven motion generation and action-conditioned motion generation. A qualitative analysis further demonstrates MotionDiffuse's controllability for comprehensive motion generation.
引用
收藏
页码:4115 / 4128
页数:14
相关论文
共 85 条
  • [1] Language2Pose: Natural Language Grounded Pose Forecasting
    Ahuja, Chaitanya
    Morency, Louis-Philippe
    [J]. 2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 719 - 728
  • [2] A Stochastic Conditioning Scheme for Diverse Human Motion Prediction
    Aliakbarian, Sadegh
    Saleh, Fatemeh Sadat
    Salzmann, Mathieu
    Petersson, Lars
    Gould, Stephen
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5222 - 5231
  • [3] PoseTrack: A Benchmark for Human Pose Estimation and Tracking
    Andriluka, Mykhaylo
    Iqbal, Umar
    Insafutdinov, Eldar
    Pishchulin, Leonid
    Milan, Anton
    Gall, Juergen
    Schiele, Bernt
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5167 - 5176
  • [4] TEACH: Temporal Action Composition for 3D Humans
    Athanasiou, Nikos
    Petrovich, Mathis
    Black, Michael J.
    Varol, Gul
    [J]. 2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 414 - 423
  • [5] HP-GAN: Probabilistic 3D human motion prediction via GAN
    Barsoum, Emad
    Kender, John
    Liu, Zicheng
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 1499 - 1508
  • [6] Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents
    Bhattacharya, Uttaran
    Rewkowski, Nicholas
    Banerjee, Abhishek
    Guhan, Pooja
    Bera, Aniket
    Manocha, Dinesh
    [J]. 2021 IEEE VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2021, : 160 - 169
  • [7] HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and Modeling
    Cai, Zhongang
    Ren, Daxuan
    Zeng, Ailing
    Lin, Zhengyu
    Yu, Tao
    Wang, Wenjia
    Fan, Xiangyu
    Gao, Yang
    Yu, Yifan
    Pan, Liang
    Hong, Fangzhou
    Zhang, Mingyuan
    Loy, Chen Change
    Yang, Lei
    Liu, Ziwei
    [J]. COMPUTER VISION, ECCV 2022, PT VII, 2022, 13667 : 557 - 577
  • [8] Cai ZA, 2024, Arxiv, DOI arXiv:2110.07588
  • [9] Cao Z., 2020, COMPUTER VISION ECCV
  • [10] Carreira J, 2019, Arxiv, DOI [arXiv:1907.06987, 10.48550/arXiv.1907.06987, DOI 10.48550/ARXIV.1907.06987]