Human Action Recognition Based on Transfer Learning Approach

被引：34

作者：

Abdulazeem, Yousry ^{[1
]}

Balaha, Hossam Magdy ^{[2
]}

Bahgat, Waleed M. ^{[3
]}

Badawy, Mahmoud ^{[2
]}

机构：

[1] Misr Higher Inst Engn & Technol, Comp Engn Dept, Mansoura 35516, Egypt

[2] Mansoura Univ, Fac Engn, Comp & Syst Engn Dept, Mansoura 35511, Egypt

[3] Mansoura Univ, Fac Comp & Informat Sci, Informat Technol Dept, Mansoura 35511, Egypt

来源：

IEEE ACCESS | 2021年 / 9卷

关键词：

Transfer learning; Feature extraction; Three-dimensional displays; Solid modeling; Deep learning; Computer architecture; Training; Convolutional neural network (CNN); human action recognition (HAR); long short-term memory (LSTM); spatiotemporal info; transfer learning (TL); DROPOUT; LSTM;

D O I：

10.1109/ACCESS.2021.3086668

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Human action recognition techniques have gained significant attention among next-generation technologies due to their specific features and high capability to inspect video sequences to understand human actions. As a result, many fields have benefited from human action recognition techniques. Deep learning techniques played a primary role in many approaches to human action recognition. The new era of learning is spreading by transfer learning. Accordingly, this study's main objective is to propose a framework with three main phases for human action recognition. The phases are pre-training, preprocessing, and recognition. This framework presents a set of novel techniques that are three-fold as follows, (i) in the pre-training phase, a standard convolutional neural network is trained on a generic dataset to adjust weights; (ii) to perform the recognition process, this pre-trained model is then applied to the target dataset; and (iii) the recognition phase exploits convolutional neural network and long short-term memory to apply five different architectures. Three architectures are stand-alone and single-stream, while the other two are combinations between the first three in two-stream style. Experimental results show that the first three architectures recorded accuracies of 83.24%, 90.72%, and 90.85%, respectively. The last two architectures achieved accuracies of 93.48% and 94.87%, respectively. Moreover, The recorded results outperform other state-of-the-art models in the same field.

引用

页码：82058 / 82069

页数：12

共 73 条

[11]

Cao Dong, 2019, ARXIV190808916

[12]

Chakraborty Mainak, 2021, International Conference on Innovative Computing and Communications. Proceedings of ICICC 2020. Advances in Intelligent Systems and Computing (AISC 1166), P331, DOI 10.1007/978-981-15-5148-2_30

[13] Action Recognition with Temporal Scale-Invariant Deep Learning Framework [J].

Chen, Huafeng ;

Chen, Jun ;

Hu, Ruimin ;

Chen, Chen ;

Wang, Zhongyuan .

CHINA COMMUNICATIONS, 2017, 14 (02) :163-172

[14] Generalized Rank Pooling for Activity Recognition [J].

Cherian, Anoop ;

Fernando, Basura ;

Harandi, Mehrtash ;

Gould, Stephen .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1581-1590

[15] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[16]

Ciresan D, 2012, PROC CVPR IEEE, P3642, DOI 10.1109/CVPR.2012.6248110

[17] Transfer learning for activity recognition: a survey [J].

Cook, Diane ;

Feuz, Kyle D. ;

Krishnan, Narayanan C. .

KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 36 (03) :537-556

[18]

Dahl GE, 2013, INT CONF ACOUST SPEE, P8609, DOI 10.1109/ICASSP.2013.6639346

[19] Human action recognition using two-stream attention based LSTM networks [J].

Dai, Cheng ;

Liu, Xingang ;

Lai, Jinfeng .

APPLIED SOFT COMPUTING, 2020, 86

[20]

Dai W., 2019, 2019 22 INT C ELECT, P1

← 1 2 3 4 5 6 7 8 →