Local Feature Fusion Temporal Convolutional Network for Human Action Recognition

被引:0
作者
Song Z. [1 ]
Zhou Y. [1 ]
Jia J. [1 ]
Xin S. [2 ]
Liu Y. [1 ]
机构
[1] School of Software, Shandong University, Jinan
[2] School of Computer Science and Technology, Shandong University, Qingdao
来源
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics | 2020年 / 32卷 / 03期
关键词
Action recognition; Temporal convolutional network; Three-dimensional human skeleton;
D O I
10.3724/SP.J.1089.2020.17934
中图分类号
学科分类号
摘要
Aiming at the problem of action recognition of the three-dimensional human skeleton sequences, a temporal convolutional network (TCN) method combining local feature fusion is proposed. Firstly, the global spatial feature of the skeleton sequence is extracted by modeling all the spatial location changes of the skeleton sequence in an action. Then, according to the topological structure of human body joints and connection relations, the global spatial features are divided into local spatial features of the human body, and the obtained local spatial features are taken as the input of corresponding TCN to learn the internal feature relations of each joint. Finally, the feature vectors of each part of the output are fused to learn the cooperative relationship between the joints of each part, to complete the recognition of the action. Classification and recognition experiments are carried out on the most challenging data set NTU-RGB+D by the proposed method. The results show that compared with the existing methods based on CNN, LSTM and TCN, the classification accuracy of cross-subject and cross-view is improved to 79.5% and 84.6%, respectively. © 2020, Beijing China Science Journal Publishing Co. Ltd. All right reserved.
引用
收藏
页码:418 / 424
页数:6
相关论文
共 29 条
[1]  
Guo G.D., Lai A., A survey on still image based human action recognition, Pattern Recognition, 47, 10, pp. 3343-3361, (2014)
[2]  
Poppe R., A survey on vision-based human action recognition, Image and Vision Computing, 28, 6, pp. 976-990, (2010)
[3]  
Zhu F., Shao L., Xie J., Et al., From handcrafted to learned representations for human action recognition: a survey, Image and Vision Computing, 55, pp. 42-52, (2016)
[4]  
Turaga P., Chellappa R., Subrahmanian V.S., Et al., Machine recognition of human activities: a survey, IEEE Transactions on Circuits and Systems for Video Technology, 18, 11, pp. 1473-1488, (2008)
[5]  
Herath S., Harandi M., Porikli F., Going deeper into action recognition: a survey, Image and Vision Computing, 60, pp. 4-21, (2017)
[6]  
Shotton J., Sharp T., Kipman A., Et al., Real-time human pose recognition in parts from single depth images, Communications of the ACM, 56, 1, pp. 116-124, (2013)
[7]  
Kawakami K., Supervised sequence labelling with recurrent neural networks, (2012)
[8]  
Graves A., Mohamed A.R., Hinton G., Speech recognition with deep recurrent neural networks, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645-6649, (2013)
[9]  
Pascanu R., Gulcehre C., Cho K., Et al., How to construct deep recurrent neural networks, International Conference on Learning Representations, 55, pp. 46-57, (2013)
[10]  
Sainath T.N., Vinyals O., Senior A., Et al., Convolutional, long short-term memory, fully connected deep neural networks, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4580-4584, (2015)