A Deep Bidirectional LSTM Model Enhanced by Transfer-Learning-Based Feature Extraction for Dynamic Human Activity Recognition

被引:15
作者
Hassan, Najmul [1 ]
Miah, Abu Saleh Musa [1 ]
Shin, Jungpil [1 ]
机构
[1] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu 9658580, Japan
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 02期
关键词
pre-trained neural networks; HAR; MobileNetv2; deep bidirectional LSTM; NEURAL-NETWORKS; DATA STREAMS; CNN; FUSION;
D O I
10.3390/app14020603
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Dynamic human activity recognition (HAR) is a domain of study that is currently receiving considerable attention within the fields of computer vision and pattern recognition. The growing need for artificial-intelligence (AI)-driven systems to evaluate human behaviour and bolster security underscores the timeliness of this research. Despite the strides made by numerous researchers in developing dynamic HAR frameworks utilizing diverse pre-trained architectures for feature extraction and classification, persisting challenges include suboptimal performance accuracy and the computational intricacies inherent in existing systems. These challenges arise due to the vast video-based datasets and the inherent similarity in the data. To address these challenges, we propose an innovative, dynamic HAR technique employing a deep-learning-based, deep bidirectional long short-term memory (Deep BiLSTM) model facilitated by a pre-trained transfer-learning-based feature-extraction approach. Our approach begins with the utilization of Convolutional Neural Network (CNN) models, specifically MobileNetV2, for extracting deep-level features from video frames. Subsequently, these features are fed into an optimized deep bidirectional long short-term memory (Deep BiLSTM) network to discern dependencies and process data, enabling optimal predictions. During the testing phase, an iterative fine-tuning procedure is introduced to update the high parameters of the trained model, ensuring adaptability to varying scenarios. The proposed model's efficacy was rigorously evaluated using three benchmark datasets, namely UCF11, UCF Sport, and JHMDB, achieving notable accuracies of 99.20%, 93.3%, and 76.30%, respectively. This high-performance accuracy substantiates the superiority of our proposed model, signaling a promising advancement in the domain of activity recognition.
引用
收藏
页数:18
相关论文
共 72 条
  • [1] Abadi M., 2016, arXiv preprint, DOI DOI 10.48550/ARXIV.1603.04467
  • [2] Activity Recognition with Evolving Data Streams: A Review
    Abdallah, Zahraa S.
    Gaber, Mohamed Medhat
    Srinivasan, Bala
    Krishnaswamy, Shonali
    [J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
  • [3] Making Sense of Neuromorphic Event Data for Human Action Recognition
    Al-Obaidi, Salah
    Al-Khafaji, Hiba
    Abhayaratne, Charith
    [J]. IEEE ACCESS, 2021, 9 : 82686 - 82700
  • [4] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT
    BENGIO, Y
    SIMARD, P
    FRASCONI, P
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02): : 157 - 166
  • [5] Semantic Pooling for Complex Event Analysis in Untrimmed Videos
    Chang, Xiaojun
    Yu, Yao-Liang
    Yang, Yi
    Xing, Eric P.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (08) : 1617 - 1632
  • [6] Das Antar A, 2019, 2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), P134, DOI [10.1109/ICIEV.2019.8858508, 10.1109/iciev.2019.8858508]
  • [7] Dynamic Fall Detection Using Graph-Based Spatial Temporal Convolution and Attention Network
    Egawa, Rei
    Miah, Abu Saleh Musa
    Hirooka, Koki
    Tomioka, Yoichi
    Shin, Jungpil
    [J]. ELECTRONICS, 2023, 12 (15)
  • [8] A hybrid model of Internet of Things and cloud computing to manage big data in health services applications
    Elhoseny, Mohamed
    Abdelaziz, Ahmed
    Salama, Ahmed S.
    Riad, A. M.
    Muhammad, Khan
    Sangaiah, Arun Kumar
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 : 1383 - 1394
  • [9] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
  • [10] Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, DOI 10.48550/ARXIV.1704.04861]