Deep Learning for Human Action Recognition: A Comprehensive Review

被引:2
作者
Duc-Quang Vu [1 ,2 ]
Trang Phung Thi Thu [3 ]
Ngan Le [4 ]
Wang, Jia-Ching [1 ]
机构
[1] Natl Cent Univ, Dept Comp Sci & Informat Engn, Taoyuan, Taiwan
[2] Thai Nguyen Univ Educ, Thai Nguyen, Vietnam
[3] Thai Nguyen Univ, Thai Nguyen, Vietnam
[4] Univ Arkansas, Dept Comp Sci & Comp Engn, Fayetteville, AR 72701 USA
关键词
Action recognition; supervised learning; self-supervised learning; deep learning; deep neural networks; NETWORKS;
D O I
10.1561/116.00000068
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Over the past several years, we have witnessed remarkable progress in numerous computer vision applications, particularly in human activity analysis. Human action recognition, which aims to automatically examine and recognize the actions taking place in the video, has been widely applied in many applications. This paper presents a comprehensive survey of approaches and techniques in deep learning-based human activity analysis. First, we introduce the problem definition in action recognition together with its challenges. Second, we provide a comprehensive survey of feature representation methods. Third, we categorize human activity methodologies and discuss their advantages and limitations. In particular, we divide human action recognition into three main categories according to training mechanisms, i.e., supervised learning, semi-supervised learning, and self-supervised learning. We further analyze the existing network architectures, their performance, and source code availability for each main category. Fourth, we provide a detailed analysis of the existing, publicly available datasets, including small-scale and large-scale datasets for human action recognition. Finally, we discuss some open issues and future research directions.
引用
收藏
页数:40
相关论文
共 126 条
[1]  
Abu-El-Haija S., 2016, arXiv
[2]   Human Activity Analysis: A Review [J].
Aggarwal, J. K. ;
Ryoo, M. S. .
ACM COMPUTING SURVEYS, 2011, 43 (03)
[3]   Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition [J].
Ahsan, Unaiza ;
Madhok, Rishi ;
Essa, Irfan .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :179-189
[4]  
[Anonymous], 2008, BMVC 2008 19 BRIT MA
[5]  
Ba LJ, 2014, ADV NEUR IN, V27
[6]   Improving Spatiotemporal Self-supervision by Deep Reinforcement Learning [J].
Buechler, Uta ;
Brattoli, Biagio ;
Ommer, Bjoern .
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 :797-814
[7]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[8]   Deep Clustering for Unsupervised Learning of Visual Features [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Joulin, Armand ;
Douze, Matthijs .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156
[9]  
Carreira J., 2018, arXiv
[10]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733