Implicit Neural Representations for Variable Length Human Motion Generation

被引:34
作者
Cervantes, Pablo [1 ]
Sekikawa, Yusuke [2 ]
Sato, Ikuro [1 ,2 ]
Shinoda, Koichi [1 ]
机构
[1] Tokyo Inst Technol, Tokyo, Japan
[2] Denso IT Lab Inc, Tokyo, Japan
来源
COMPUTER VISION - ECCV 2022, PT XVII | 2022年 / 13677卷
关键词
Motion generation; Implicit Neural Representations;
D O I
10.1007/978-3-031-19790-1_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an action-conditional human motion generation method using variational implicit neural representations (INR). The variational formalism enables action-conditional distributions of INRs, from which one can easily sample representations to generate novel human motion sequences. Our method offers variable-length sequence generation by construction because a part of INR is optimized for a whole sequence of arbitrary length with temporal embeddings. In contrast, previous works reported difficulties with modeling variable-length sequences. We confirm that our method with a Transformer decoder outperforms all relevant methods on HumanAct12, NTU-RGBD, and UESTC datasets in terms of realism and diversity of generated motions. Surprisingly, even our method with an MLP decoder consistently outperforms the state-of-the-art Transformer-based auto-encoder. In particular, we show that variable-length motions generated by our method are better than fixedlength motions generated by the state-of-the-art method in terms of realism and diversity. Code at https://github.com/PACerv/ImplicitMotion.
引用
收藏
页码:356 / 372
页数:17
相关论文
共 41 条
[11]   Action2Motion: Conditioned Generation of 3D Human Motions [J].
Guo, Chuan ;
Zuo, Xinxin ;
Wang, Sen ;
Zou, Shihao ;
Sun, Qingyao ;
Deng, Annan ;
Gon, Minglun ;
Cheng, Li .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :2021-2029
[12]  
Higgins I., 2017, BETA VAE LEARNING BA, V3, P1
[13]  
Honda Y., 2020, BRIT MACHINE VISION
[14]   Soul Dancer: Emotion-Based Human Action Generation [J].
Hou, Yuxin ;
Yao, Hongxun ;
Sun, Xiaoshuai ;
Li, Haoran .
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (03)
[15]   A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition [J].
Ji, Yanli ;
Xu, Feixiang ;
Yang, Yang ;
Shen, Fumin ;
Shen, Heng Tao ;
Zheng, Wei-Shi .
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :1510-1518
[16]   Learning 3D Human Dynamics from Video [J].
Kanazawa, Angjoo ;
Zhang, Jason Y. ;
Felsen, Panna ;
Malik, Jitendra .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5597-5606
[17]   VIBE: Video Inference for Human Body Pose and Shape Estimation [J].
Kocabas, Muhammed ;
Athanasiou, Nikos ;
Black, Michael J. .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5252-5262
[18]  
Li R., 2021, Proceedings of IEEE International Conference on Computer Vision
[19]   Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes [J].
Li, Zhengqi ;
Niklaus, Simon ;
Snavely, Noah ;
Wang, Oliver .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6494-6504
[20]   NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding [J].
Liu, Jun ;
Shahroudy, Amir ;
Perez, Mauricio ;
Wang, Gang ;
Duan, Ling-Yu ;
Kot, Alex C. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) :2684-2701