Enhanced view-independent representation method for skeleton-based human action recognition

被引:0
作者
Jiang Y. [1 ,2 ]
Lu L. [1 ]
Xu J. [3 ]
机构
[1] School of Computer Science and Engineering, South China University of Technology, Guangzhou
[2] Unit 95795 of the People’s Liberation Army, Guilin
[3] Unit 95269 of the People’s Liberation Army, Guangzhou
关键词
Action recognition; Attention mechanism; Cumulative Euclidean distance; Euler angles; View-independent representation;
D O I
10.1504/IJICT.2021.117047
中图分类号
学科分类号
摘要
Human action recognition is an important branch of computer vision science. It is a challenging task based on skeletal data because joints have complex spatiotemporal information. In this work, we propose a method for action recognition, which consists of three parts: view-independent representation, combination with cumulative Euclidean distance, and combined model. First, the action sequence becomes view-independent representations independent of the view. Second, these representations are further combined with cumulative Euclidean distances, so the joints more closely associated with the action are emphasised. Then, a combined model is adopted to extract these representation features and classify actions. It consists of two parts, a regular three-layer BLSTM network, and a temporal attention module. Experimental results on two multi-view benchmark datasets, Northwestern-UCLA and NTU RGB + D, demonstrate the effectiveness of our complete method. Despite its simple architecture and the use of only one type of action feature, it can still significantly improve recognition performance and has strong robustness. Copyright © 2021 Inderscience Enterprises Ltd.
引用
收藏
页码:201 / 218
页数:17
相关论文
共 37 条
[1]  
Anirudh R., Turaga P., Su J., Srivastava A., Elastic functional coding of human actions: from vector-fields to latent variables, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3147-3155, (2015)
[2]  
Chen C., Jafari R., Kehtarnavaz N., Improving human action recognition using fusion of depth camera and inertial sensors, IEEE Transactions on Human-Machine Systems, 45, 1, pp. 51-61, (2015)
[3]  
Graves A., Jaitly N., Mohamed A-r., Hybrid speech recognition with deep bidirectional LSTM, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273-278, (2013)
[4]  
Hussein M.E., Torki M., Gowayyed M.A., El-Saban M., Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations, Twenty-Third International Joint Conference on Artificial Intelligence, (2013)
[5]  
Ijjina E.P., Chalavadi K.M., Human action recognition in RGB-D videos using motion sequence information and deep learning, Pattern Recognition, 72, 33, pp. 504-516, (2017)
[6]  
Jagannatha A.N., Yu H., Bidirectional RNN for medical event detection in electronic health records, Proceedings of the Conference, Association for Computational Linguistics, North American Chapter, Meeting, NIH Public Access, 2016, (2016)
[7]  
Lee I., Kim D., Kang S., Lee S., Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks, Proceedings of the IEEE International Conference on Computer Vision, pp. 1012-1020, (2017)
[8]  
Li B., Li X., Zhang Z., Wu F., Spatio-Temporal Graph Routing for Skeleton-Based Action Recognition, (2019)
[9]  
Li C., Cui Z., Zheng W., Xu C., Yang J., Spatio-temporal graph convolution for skeleton based action recognition, Thirty-Second AAAI Conference on Artificial Intelligence, (2018)
[10]  
Li C., Cui Z., Zheng W., Xu C., Ji R., Yang J., Action-attending graphic neural network, IEEE Transactions on Image Processing, 27, 7, pp. 3657-3670, (2018)