Human Action Recognition Based on Transfer Learning Approach

被引:34
作者
Abdulazeem, Yousry [1 ]
Balaha, Hossam Magdy [2 ]
Bahgat, Waleed M. [3 ]
Badawy, Mahmoud [2 ]
机构
[1] Misr Higher Inst Engn & Technol, Comp Engn Dept, Mansoura 35516, Egypt
[2] Mansoura Univ, Fac Engn, Comp & Syst Engn Dept, Mansoura 35511, Egypt
[3] Mansoura Univ, Fac Comp & Informat Sci, Informat Technol Dept, Mansoura 35511, Egypt
关键词
Transfer learning; Feature extraction; Three-dimensional displays; Solid modeling; Deep learning; Computer architecture; Training; Convolutional neural network (CNN); human action recognition (HAR); long short-term memory (LSTM); spatiotemporal info; transfer learning (TL); DROPOUT; LSTM;
D O I
10.1109/ACCESS.2021.3086668
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human action recognition techniques have gained significant attention among next-generation technologies due to their specific features and high capability to inspect video sequences to understand human actions. As a result, many fields have benefited from human action recognition techniques. Deep learning techniques played a primary role in many approaches to human action recognition. The new era of learning is spreading by transfer learning. Accordingly, this study's main objective is to propose a framework with three main phases for human action recognition. The phases are pre-training, preprocessing, and recognition. This framework presents a set of novel techniques that are three-fold as follows, (i) in the pre-training phase, a standard convolutional neural network is trained on a generic dataset to adjust weights; (ii) to perform the recognition process, this pre-trained model is then applied to the target dataset; and (iii) the recognition phase exploits convolutional neural network and long short-term memory to apply five different architectures. Three architectures are stand-alone and single-stream, while the other two are combinations between the first three in two-stream style. Experimental results show that the first three architectures recorded accuracies of 83.24%, 90.72%, and 90.85%, respectively. The last two architectures achieved accuracies of 93.48% and 94.87%, respectively. Moreover, The recorded results outperform other state-of-the-art models in the same field.
引用
收藏
页码:82058 / 82069
页数:12
相关论文
共 73 条
[11]  
Cao Dong, 2019, ARXIV190808916
[12]  
Chakraborty Mainak, 2021, International Conference on Innovative Computing and Communications. Proceedings of ICICC 2020. Advances in Intelligent Systems and Computing (AISC 1166), P331, DOI 10.1007/978-981-15-5148-2_30
[13]   Action Recognition with Temporal Scale-Invariant Deep Learning Framework [J].
Chen, Huafeng ;
Chen, Jun ;
Hu, Ruimin ;
Chen, Chen ;
Wang, Zhongyuan .
CHINA COMMUNICATIONS, 2017, 14 (02) :163-172
[14]   Generalized Rank Pooling for Activity Recognition [J].
Cherian, Anoop ;
Fernando, Basura ;
Harandi, Mehrtash ;
Gould, Stephen .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1581-1590
[15]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[16]  
Ciresan D, 2012, PROC CVPR IEEE, P3642, DOI 10.1109/CVPR.2012.6248110
[17]   Transfer learning for activity recognition: a survey [J].
Cook, Diane ;
Feuz, Kyle D. ;
Krishnan, Narayanan C. .
KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 36 (03) :537-556
[18]  
Dahl GE, 2013, INT CONF ACOUST SPEE, P8609, DOI 10.1109/ICASSP.2013.6639346
[19]   Human action recognition using two-stream attention based LSTM networks [J].
Dai, Cheng ;
Liu, Xingang ;
Lai, Jinfeng .
APPLIED SOFT COMPUTING, 2020, 86
[20]  
Dai W., 2019, 2019 22 INT C ELECT, P1