Exploring a rich spatial-temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN

被引:57
作者
Zhu, Aichun [1 ,2 ]
Wu, Qianyu [1 ]
Cui, Ran [2 ]
Wang, Tian [3 ]
Hang, Wenlong [1 ]
Hua, Gang [2 ]
Snoussi, Hichem [4 ]
机构
[1] Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing, Peoples R China
[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou, Jiangsu, Peoples R China
[3] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing, Peoples R China
[4] Univ Technol Troyes, ICD LM2S, Troyes, France
基金
中国国家自然科学基金;
关键词
Action recognition; Dependent relational model; Spatial-temporal information;
D O I
10.1016/j.neucom.2020.07.068
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the fast development of effective and low-cost human skeleton capture systems, skeleton-based action recognition has attracted much attention recently. Most existing methods using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) have achieved promising performance for skeleton-based action recognition. However, these approaches are limited in the ability to explore the rich spatial-temporal relational information. In this paper, we propose a new spatial-temporal model with an end-to-end bidirectional LSTM-CNN (BiLSTM-CNN). First, a hierarchical spatial-temporal dependent relational model is used to explore rich spatial-temporal information in the skeleton data. Then a new framework is proposed to fuse CNN and LSTM. In this framework, the skeleton data are built by the dependent relational model and serve as the input of the proposed network. Then LSTM is used to extract the temporal features, and followed by a standard CNN to explore the spatial information from the output of LSTM. Finally, the experimental results demonstrate the effectiveness of the proposed model on the NTU RGB+D, SBU Interaction and UTD-MHAD dataset. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:90 / 100
页数:11
相关论文
共 53 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], [No title captured]
[3]  
Aydin R, 2014, IN C IND ENG ENG MAN, P1, DOI 10.1109/IEEM.2014.7058588
[4]   Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points [J].
Baradel, Fabien ;
Wolf, Christian ;
Mille, Julien ;
Taylor, Graham W. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :469-478
[5]  
Chen C, 2015, IEEE IMAGE PROC, P168, DOI 10.1109/ICIP.2015.7350781
[6]  
Cui R, 2018, INT C PATT RECOG, P547, DOI 10.1109/ICPR.2018.8545247
[7]   Multisource learning for skeleton-based action recognition using deep LSTM and CNN [J].
Cui, Ran ;
Zhu, Aichun ;
Hua, Gang ;
Yin, Hongsheng ;
Liu, Haiqiang .
JOURNAL OF ELECTRONIC IMAGING, 2018, 27 (04)
[8]   Transductive Joint-Knowledge-Transfer TSK FS for Recognition of Epileptic EEG Signals [J].
Deng, Zhaohong ;
Xu, Peng ;
Xie, Lixiao ;
Choi, Kup-Sze ;
Wang, Shitong .
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2018, 26 (08) :1481-1494
[9]   Enhanced Knowledge-Leverage-Based TSK Fuzzy System Modeling for Inductive Transfer Learning [J].
Deng, Zhaohong ;
Jiang, Yizhang ;
Ishibuchi, Hisao ;
Choi, Kup-Sze ;
Wang, Shitong .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2016, 8 (01)
[10]   Transfer Prototype-Based Fuzzy Clustering [J].
Deng, Zhaohong ;
Jiang, Yizhang ;
Chung, Fu-Lai ;
Ishibuchi, Hisao ;
Choi, Kup-Sze ;
Wang, Shitong .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2016, 24 (05) :1210-1232