A hybrid deep learning framework for daily living human activity recognition with cluster-based video summarization

被引:0
作者
Hossain S. [1 ,2 ]
Deb K. [1 ]
Sakib S. [1 ]
Sarker I.H. [3 ]
机构
[1] Department of Computer Science and Engineering, Chittagong University of Engineering & Technology (CUET), Chattogram
[2] Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka, Narayanganj
[3] Centre for Securing Digital Futures, School of Science, Edith Cowan University, Perth, 6027, WA
关键词
Behavioral analytics; Deep learning; Human Activity Recognition; Keyframe selection; Video summarization;
D O I
10.1007/s11042-024-19022-0
中图分类号
学科分类号
摘要
In assisted living facilities or nursing homes, residents’ movements or actions can be monitored using Human Activity Recognition (HAR), ensuring they receive proper care and attention. The significance of HAR is substantial in reviewing and updating emergency response plans to address unusual behavior patterns of individuals in the context of daily living activities. Recognizing activity from video data entails extracting spatial features and subsequently determining the temporal variations across these extracted spatial parameters. A specified number of frames is required to be sampled to analyze video data in recognizing the association of semantic information across the sequential frames. Even while sample frames engage in an essential function, they are often selected at random or skipped sequentially, resulting in temporal data loss. A proper video summary that retains the originality of the video while presenting the most important details might be a solution to the problem highlighted. Addressing the issue, we propose a cluster-based approach for selecting keyframes that facilitates generating video summarization by extracting the relevant frames. Additionally, we explore two different deep learning strategies for recognizing action to assess the effective one: (a) pose-based activity recognition model and (b) single hybrid pre-trained CNN-LSTM model. The experimental findings demonstrate the efficacy of the single hybrid CNN-LSTM technique. Our proposed model yields a mean accuracy of 95.56% for the RGB video data modality, surpassing the performance of several recent works of multimodal using the MSRDailyActivity3D dataset. In addition, the proposed model is evaluated using two challenging datasets: PRECIS HAR and UCF11. Our proposed single hybrid CNN-LSTM model achieves 95.12% precision, 95.11% recall, and 95.03% f1 score on the MSRDailyActivity3D dataset. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:6219 / 6272
页数:53
相关论文
共 66 条
[1]  
Bhola G., Vishwakarma D.K., A review of vision-based indoor har: state-of-the-art, challenges, and future prospects, Multimedia Tools and Applications, 83, 1, pp. 1965-2005, (2024)
[2]  
Ashraf I., Zikria Y.B., Hur S., Bashir A.K., Alhussain T., Park Y., Localizing pedestrians in indoor environments using magnetic field data with term frequency paradigm and deep neural networks, International Journal of Machine Learning and Cybernetics, 1, (2021)
[3]  
Edwards M., Deng J., Xie X., From pose to activity: Surveying datasets and introducing converse, Computer Vision and Image Understanding, 144, pp. 73-105, (2016)
[4]  
Du Y., Chen F., Xu W., Human interaction representation and recognition through motion decomposition, IEEE Signal Process Lett, 14, 12, pp. 952-955, (2007)
[5]  
Mudgal M., Punj D., Pillai A., Suspicious action detection in intelligent surveillance system using action attribute modelling, Journal of Web Engineering, 20, 1, pp. 129-146, (2021)
[6]  
Sarma M.S., Deb K., Dhar P.K., Koshiba T., Traditional bangladeshi sports video classification using deep learning method, Applied Sciences, 11, 5, (2021)
[7]  
Sen A., Deb K., Dhar P.K., Koshiba T., Cricshotclassify: an approach to classifying batting shots from cricket videos using a convolutional neural network and gated recurrent unit, Sensors, 21, 8, (2021)
[8]  
Sen A., Deb K., Categorization of actions in soccer videos using a combination of transfer learning and gated recurrent unit, ICT Express, 8, 1, pp. 65-71, (2022)
[9]  
Ben-Arie J., Wang Z., Pandit P., Rajaram S., Human activity recognition using multidimensional indexing, IEEE Trans Pattern Anal Mach Intell, 24, 8, pp. 1091-1104, (2002)
[10]  
Dollar P., Rabaud V., Cottrell G., Belongie S., Behavior recognition via sparse spatio-temporal features, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, (2005)