A Deep Bidirectional LSTM Model Enhanced by Transfer-Learning-Based Feature Extraction for Dynamic Human Activity Recognition

被引：15

作者：

Hassan, Najmul ^{[1
]}

Miah, Abu Saleh Musa ^{[1
]}

Shin, Jungpil ^{[1
]}

机构：

[1] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu 9658580, Japan

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 02期

关键词：

pre-trained neural networks; HAR; MobileNetv2; deep bidirectional LSTM; NEURAL-NETWORKS; DATA STREAMS; CNN; FUSION;

D O I：

10.3390/app14020603

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Dynamic human activity recognition (HAR) is a domain of study that is currently receiving considerable attention within the fields of computer vision and pattern recognition. The growing need for artificial-intelligence (AI)-driven systems to evaluate human behaviour and bolster security underscores the timeliness of this research. Despite the strides made by numerous researchers in developing dynamic HAR frameworks utilizing diverse pre-trained architectures for feature extraction and classification, persisting challenges include suboptimal performance accuracy and the computational intricacies inherent in existing systems. These challenges arise due to the vast video-based datasets and the inherent similarity in the data. To address these challenges, we propose an innovative, dynamic HAR technique employing a deep-learning-based, deep bidirectional long short-term memory (Deep BiLSTM) model facilitated by a pre-trained transfer-learning-based feature-extraction approach. Our approach begins with the utilization of Convolutional Neural Network (CNN) models, specifically MobileNetV2, for extracting deep-level features from video frames. Subsequently, these features are fed into an optimized deep bidirectional long short-term memory (Deep BiLSTM) network to discern dependencies and process data, enabling optimal predictions. During the testing phase, an iterative fine-tuning procedure is introduced to update the high parameters of the trained model, ensuring adaptability to varying scenarios. The proposed model's efficacy was rigorously evaluated using three benchmark datasets, namely UCF11, UCF Sport, and JHMDB, achieving notable accuracies of 99.20%, 93.3%, and 76.30%, respectively. This high-performance accuracy substantiates the superiority of our proposed model, signaling a promising advancement in the domain of activity recognition.

引用

页数：18

共 72 条

[1] Abadi M., 2016, arXiv preprint, DOI DOI 10.48550/ARXIV.1603.04467
[2] Activity Recognition with Evolving Data Streams: A Review
Abdallah, Zahraa S.
Gaber, Mohamed Medhat
Srinivasan, Bala
Krishnaswamy, Shonali
[J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
[3] Making Sense of Neuromorphic Event Data for Human Action Recognition
Al-Obaidi, Salah
Al-Khafaji, Hiba
Abhayaratne, Charith
[J]. IEEE ACCESS, 2021, 9 : 82686 - 82700
[4] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT
BENGIO, Y
SIMARD, P
FRASCONI, P
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02): : 157 - 166
[5] Semantic Pooling for Complex Event Analysis in Untrimmed Videos
Chang, Xiaojun
Yu, Yao-Liang
Yang, Yi
Xing, Eric P.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (08) : 1617 - 1632
[6] Das Antar A, 2019, 2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), P134, DOI [10.1109/ICIEV.2019.8858508, 10.1109/iciev.2019.8858508]
[7] Dynamic Fall Detection Using Graph-Based Spatial Temporal Convolution and Attention Network
Egawa, Rei
Miah, Abu Saleh Musa
Hirooka, Koki
Tomioka, Yoichi
Shin, Jungpil
[J]. ELECTRONICS, 2023, 12 (15)
[8] A hybrid model of Internet of Things and cloud computing to manage big data in health services applications
Elhoseny, Mohamed
Abdelaziz, Ahmed
Salama, Ahmed S.
Riad, A. M.
Muhammad, Khan
Sangaiah, Arun Kumar
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 : 1383 - 1394
[9] Convolutional Two-Stream Network Fusion for Video Action Recognition
Feichtenhofer, Christoph
Pinz, Axel
Zisserman, Andrew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
[10] Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, DOI 10.48550/ARXIV.1704.04861]

← 1 2 3 4 5 6 7 8 →