MLUG: Bootstrapping Language-Motion Pre-Training for Unified Motion-Language Understanding and Generation

被引：0

作者：

Luo, Hongliang ^{[1
]}

Xi, Wei ^{[1
]}

Tang, Daniel ^{[2
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Peoples R China

[2] Mind Bridge AI Ltd, Ottawa, ON K1S 5R5, Canada

来源：

SENSORS | 2024年 / 24卷 / 22期

关键词：

motion generation; language motion; unified models;

D O I：

10.3390/s24227354

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

In the realm of computer vision and animation, the generation of human motion from textual descriptions represents a frontier of significant challenge and potential. This paper introduces MLUG, a groundbreaking framework poised to transform motion synthesis by harnessing the power of vision-language pre-training techniques. MLUG addresses the nuanced challenge of creating semantically rich, physically plausible, and emotionally expressive human motions through a novel integration of a unimodal encoder with motion-text contrastive loss, a motion-grounded text encoder, a motion-grounded motion decoder, and a motion length predictor. These components work in concert to align textual descriptions with dynamic motion sequences, offering an innovative solution to the limitations of existing models in open-vocabulary motion generation and emotional expressiveness. Through extensive evaluations, MLUG demonstrates unparalleled effectiveness in generating realistic and diverse motions from a broad spectrum of textual inputs, setting a new benchmark in the field.

引用

页数：13

共 53 条

[1] Aggarwal G., 2021, arXiv
[2] Ahn H, 2018, IEEE INT CONF ROBOT, P5915
[3] Language2Pose: Natural Language Grounded Pose Forecasting
Ahuja, Chaitanya
Morency, Louis-Philippe
[J]. 2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 719 - 728
[4] Noise Reduction in Human Motion-Captured Signals for Computer Animation based on B-Spline Filtering
Ardestani, Mehdi Memar
Yan, Hong
[J]. SENSORS, 2022, 22 (12)
[5] Athanasiou N., 2022, P INT C 3D VIS 3DV P
[6] Brown TB, 2020, ADV NEUR IN, V33
[7] Cai ZA, 2023, Arxiv, DOI arXiv:2204.13686
[8] Cai ZA, 2024, Arxiv, DOI arXiv:2110.07588
[9] Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Changpinyo, Soravit
Sharma, Piyush
Ding, Nan
Soricut, Radu
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3557 - 3567
[10] Executing your Commands via Motion Diffusion in Latent Space
Chen, Xin
Jiang, Biao
Liu, Wen
Huang, Zilong
Fu, Bin
Chen, Tao
Yu, Gang
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18000 - 18010

← 1 2 3 4 5 6 →