MLUG: Bootstrapping Language-Motion Pre-Training for Unified Motion-Language Understanding and Generation

被引:0
作者
Luo, Hongliang [1 ]
Xi, Wei [1 ]
Tang, Daniel [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Peoples R China
[2] Mind Bridge AI Ltd, Ottawa, ON K1S 5R5, Canada
关键词
motion generation; language motion; unified models;
D O I
10.3390/s24227354
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
In the realm of computer vision and animation, the generation of human motion from textual descriptions represents a frontier of significant challenge and potential. This paper introduces MLUG, a groundbreaking framework poised to transform motion synthesis by harnessing the power of vision-language pre-training techniques. MLUG addresses the nuanced challenge of creating semantically rich, physically plausible, and emotionally expressive human motions through a novel integration of a unimodal encoder with motion-text contrastive loss, a motion-grounded text encoder, a motion-grounded motion decoder, and a motion length predictor. These components work in concert to align textual descriptions with dynamic motion sequences, offering an innovative solution to the limitations of existing models in open-vocabulary motion generation and emotional expressiveness. Through extensive evaluations, MLUG demonstrates unparalleled effectiveness in generating realistic and diverse motions from a broad spectrum of textual inputs, setting a new benchmark in the field.
引用
收藏
页数:13
相关论文
共 53 条
  • [31] BLEU: a method for automatic evaluation of machine translation
    Papineni, K
    Roukos, S
    Ward, T
    Zhu, WJ
    [J]. 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 311 - 318
  • [32] Patashnik O, 2021, Arxiv, DOI [arXiv:2103.17249, DOI 10.48550/ARXIV.2103.17249]
  • [33] Peng SD, 2021, Arxiv, DOI arXiv:2105.02872
  • [34] Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans
    Peng, Sida
    Zhang, Yuanqing
    Xu, Yinghao
    Wang, Qianqian
    Shuai, Qing
    Bao, Hujun
    Zhou, Xiaowei
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9050 - 9059
  • [35] Action-Conditioned 3D Human Motion Synthesis with Transformer VAE
    Petrovich, Mathis
    Black, Michael J.
    Varol, Guel
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10965 - 10975
  • [36] The KIT Motion-Language Dataset
    Plappert, Matthias
    Mandery, Christian
    Asfour, Tamim
    [J]. BIG DATA, 2016, 4 (04) : 236 - 252
  • [37] BABEL: Bodies, Action and Behavior with English Labels
    Punnakkal, Abhinanda R.
    Chandrasekaran, Arjun
    Athanasiou, Nikos
    Quiros-Ramirez, Alejandra
    Black, Michael J.
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 722 - 731
  • [38] Radford A, 2021, PR MACH LEARN RES, V139
  • [39] Sanghi A., 2021, arXiv
  • [40] Sharma P, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2556