A review of human action recognition based on deep learning

被引:0
作者
Zhu Y. [1 ]
Zhao J.-K. [1 ]
Wang Y.-N. [1 ]
Zheng B.-B. [1 ]
机构
[1] School of Information Science and Engineering, East China University of Science and Technology, Shanghai
来源
Zidonghua Xuebao/Acta Automatica Sinica | 2016年 / 42卷 / 06期
基金
中国国家自然科学基金;
关键词
Action recognition; Convolution neural network (CNN); Deep learning; Restricted Boltzmann machine (RBM);
D O I
10.16383/j.aas.2016.c150710
中图分类号
TN911 [通信理论];
学科分类号
081002 ;
摘要
Human action recognition is an active research topic in intelligent video analysis and is gaining extensive attention in academic and engineering communities. This technology is an important basis of intelligent video analysis, video tagging, human computer interaction and many other fields. The deep learning theory has been made remarkable achievements on still image feature extraction and gradually extends to the time sequences of human action videos. This paper reviews the traditional design of action recognition methods, such as spatial-temporal interest point, introduces and analyzes different human action recognition framework based on deep learning, including convolution neural network (CNN), independent subspace analysis (ISA) model, restricted Boltzmann machine (RBM), and recurrent neural network (RNN). Finally, this paper summarizes the advantages and disadvantages of these methods. Copyright © 2016 Acta Automatica Sinica. All rights reserved.
引用
收藏
页码:848 / 857
页数:9
相关论文
共 51 条
[1]  
Fujiyoshi H., Lipton A.J., Kanade T., Real-time human motion analysis by image skeletonization, IEICE Transactions on Information and Systems, 87-D, 1, pp. 113-120, (2004)
[2]  
Chaudhry R., Ravichandran A., Hager G., Vidal R., Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions, Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1932-1939, (2009)
[3]  
Dalal N., Triggs B., Histograms of oriented gradients for human detection, Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, pp. 886-893, (2005)
[4]  
Lowe D.G., Object recognition from local scale-invariant features, Proceedings of the 7th IEEE International Conference on Computer Vision, pp. 1150-1157, (1999)
[5]  
Schuldt C., Laptev I., Caputo B., Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, pp. 32-36, (2004)
[6]  
Dollar P., Rabaud V., Cottrell G., Belongie S., Behavior recognition via sparse spatio-temporal features, Proceedings of the 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, (2005)
[7]  
Rapantzikos K., Avrithis Y., Kollias S., Dense saliency-based spatiotemporal feature points for action recognition, Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1454-1461, (2009)
[8]  
Knopp J., Prasad M., Willems G., Timofte R., Van Gool L., Hough transform and 3D SURF for robust three dimensional classification, Proceedings of the 11th European Confer- ence on Computer Vision (ECCV 2010), pp. 589-602, (2010)
[9]  
Klaser A., Marszaeek M., Schmid C., A spatio-temporal descriptor based on 3D-gradients, Proceedings of the 19th British Machine Vision Conference, pp. 99.1-99.10, (2008)
[10]  
Wang H., Ullah M.M., Klaser A., Laptev I., Schmid C., Evaluation of local spatio-temporal features for action recognition, Proceedings of the 2009 British Machine Vision Conference, pp. 124.1-124.11, (2009)