Implicit Neural Representations for Variable Length Human Motion Generation

被引：34

作者：

Cervantes, Pablo ^{[1
]}

Sekikawa, Yusuke ^{[2
]}

Sato, Ikuro ^{[1
,2
]}

Shinoda, Koichi ^{[1
]}

机构：

[1] Tokyo Inst Technol, Tokyo, Japan

[2] Denso IT Lab Inc, Tokyo, Japan

来源：

COMPUTER VISION - ECCV 2022, PT XVII | 2022年 / 13677卷

关键词：

Motion generation; Implicit Neural Representations;

D O I：

10.1007/978-3-031-19790-1_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose an action-conditional human motion generation method using variational implicit neural representations (INR). The variational formalism enables action-conditional distributions of INRs, from which one can easily sample representations to generate novel human motion sequences. Our method offers variable-length sequence generation by construction because a part of INR is optimized for a whole sequence of arbitrary length with temporal embeddings. In contrast, previous works reported difficulties with modeling variable-length sequences. We confirm that our method with a Transformer decoder outperforms all relevant methods on HumanAct12, NTU-RGBD, and UESTC datasets in terms of realism and diversity of generated motions. Surprisingly, even our method with an MLP decoder consistently outperforms the state-of-the-art Transformer-based auto-encoder. In particular, we show that variable-length motions generated by our method are better than fixedlength motions generated by the state-of-the-art method in terms of realism and diversity. Code at https://github.com/PACerv/ImplicitMotion.

引用

页码：356 / 372

页数：17

共 41 条

[11] Action2Motion: Conditioned Generation of 3D Human Motions [J].

Guo, Chuan ;

Zuo, Xinxin ;

Wang, Sen ;

Zou, Shihao ;

Sun, Qingyao ;

Deng, Annan ;

Gon, Minglun ;

Cheng, Li .

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :2021-2029

[12]

Higgins I., 2017, BETA VAE LEARNING BA, V3, P1

[13]

Honda Y., 2020, BRIT MACHINE VISION

[14] Soul Dancer: Emotion-Based Human Action Generation [J].

Hou, Yuxin ;

Yao, Hongxun ;

Sun, Xiaoshuai ;

Li, Haoran .

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (03)

[15] A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition [J].

Ji, Yanli ;

Xu, Feixiang ;

Yang, Yang ;

Shen, Fumin ;

Shen, Heng Tao ;

Zheng, Wei-Shi .

PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :1510-1518

[16] Learning 3D Human Dynamics from Video [J].

Kanazawa, Angjoo ;

Zhang, Jason Y. ;

Felsen, Panna ;

Malik, Jitendra .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5597-5606

[17] VIBE: Video Inference for Human Body Pose and Shape Estimation [J].

Kocabas, Muhammed ;

Athanasiou, Nikos ;

Black, Michael J. .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5252-5262

[18]

Li R., 2021, Proceedings of IEEE International Conference on Computer Vision

[19] Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes [J].

Li, Zhengqi ;

Niklaus, Simon ;

Snavely, Noah ;

Wang, Oliver .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6494-6504

[20] NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding [J].

Liu, Jun ;

Shahroudy, Amir ;

Perez, Mauricio ;

Wang, Gang ;

Duan, Ling-Yu ;

Kot, Alex C. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) :2684-2701

← 1 2 3 4 5 →