Human Action Recognition Based on Transfer Learning Approach

被引：34

作者：

Abdulazeem, Yousry ^{[1
]}

Balaha, Hossam Magdy ^{[2
]}

Bahgat, Waleed M. ^{[3
]}

Badawy, Mahmoud ^{[2
]}

机构：

[1] Misr Higher Inst Engn & Technol, Comp Engn Dept, Mansoura 35516, Egypt

[2] Mansoura Univ, Fac Engn, Comp & Syst Engn Dept, Mansoura 35511, Egypt

[3] Mansoura Univ, Fac Comp & Informat Sci, Informat Technol Dept, Mansoura 35511, Egypt

来源：

IEEE ACCESS | 2021年 / 9卷

关键词：

Transfer learning; Feature extraction; Three-dimensional displays; Solid modeling; Deep learning; Computer architecture; Training; Convolutional neural network (CNN); human action recognition (HAR); long short-term memory (LSTM); spatiotemporal info; transfer learning (TL); DROPOUT; LSTM;

D O I：

10.1109/ACCESS.2021.3086668

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Human action recognition techniques have gained significant attention among next-generation technologies due to their specific features and high capability to inspect video sequences to understand human actions. As a result, many fields have benefited from human action recognition techniques. Deep learning techniques played a primary role in many approaches to human action recognition. The new era of learning is spreading by transfer learning. Accordingly, this study's main objective is to propose a framework with three main phases for human action recognition. The phases are pre-training, preprocessing, and recognition. This framework presents a set of novel techniques that are three-fold as follows, (i) in the pre-training phase, a standard convolutional neural network is trained on a generic dataset to adjust weights; (ii) to perform the recognition process, this pre-trained model is then applied to the target dataset; and (iii) the recognition phase exploits convolutional neural network and long short-term memory to apply five different architectures. Three architectures are stand-alone and single-stream, while the other two are combinations between the first three in two-stream style. Experimental results show that the first three architectures recorded accuracies of 83.24%, 90.72%, and 90.85%, respectively. The last two architectures achieved accuracies of 93.48% and 94.87%, respectively. Moreover, The recorded results outperform other state-of-the-art models in the same field.

引用

页码：82058 / 82069

页数：12

共 73 条

[1] A Review on Computer Vision-Based Methods for Human Action Recognition [J].

Al-Faris, Mahmoud ;

Chiverton, John ;

Ndzi, David ;

Ahmed, Ahmed Isam .

JOURNAL OF IMAGING, 2020, 6 (06)

[2]

[Anonymous], 2014, Comput. Sci.

[3]

[Anonymous], 2014, arXiv

[4]

[Anonymous], 2016, P ICLR

[5] Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization [J].

Aslan, Muhammet Fatih ;

Durdu, Akif ;

Sabanci, Kadir .

NEURAL COMPUTING & APPLICATIONS, 2020, 32 (12) :8585-8597

[6]

Babiker Mohanad., 2017, 2017 IEEE 4 INT C SM, P1

[7] Dynamic Image Networks for Action Recognition [J].

Bilen, Hakan ;

Fernando, Basura ;

Gavves, Efstratios ;

Vedaldi, Andrea ;

Gould, Stephen .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3034-3042

[8]

Bjorck N., 2018, ADV NEURAL INFORM PR, P7694

[9]

Bo Y, 2020, IEEE WINT CONF APPL, P584, DOI [10.1109/wacv45572.2020.9093481, 10.1109/WACV45572.2020.9093481]

[10] High accuracy optical flow estimation based on a theory for warping [J].

Brox, T ;

Bruhn, A ;

Papenberg, N ;

Weickert, J .

COMPUTER VISION - ECCV 2004, PT 4, 2004, 2034 :25-36

← 1 2 3 4 5 6 7 8 →