Human Interaction Recognition in Surveillance Videos Using Hybrid Deep Learning and Machine Learning Models

被引:0
作者
Khean, Vesal [1 ]
Kim, Chomyong [2 ]
Ryu, Sunjoo [2 ]
Khan, Awais [1 ]
Hong, Min Kyung [3 ]
Kim, Eun Young [4 ]
Kim, Joungmin [5 ]
Nam, Yunyoung [3 ]
机构
[1] Soonchunhyang Univ, Dept ICT Convergence, Asan 31538, South Korea
[2] Soonchunhyang Univ, ICT Convergence Res Ctr, Asan 31538, South Korea
[3] Soonchunhyang Univ, Emot & Intelligent Child Care Convergence Ctr, Asan 31538, South Korea
[4] Soonchunhyang Univ, Dept Occupat Therapy, Asan 31538, South Korea
[5] Soonchunhyang Univ, Coll Hyangsul Nanum, Asan 31538, South Korea
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 81卷 / 01期
基金
新加坡国家研究基金会;
关键词
Convolutional neural network; deep learning; human interaction recognition; ResNet; skeleton joint key points; human pose estimation; hybrid deep learning and machine learning; HUMAN POSE ESTIMATION;
D O I
10.32604/cmc.2024.056767
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human Interaction Recognition (HIR) was one of the challenging issues in computer vision research due to the involvement of multiple individuals and their mutual interactions within video frames generated from their movements. HIR requires more sophisticated analysis than Human Action Recognition (HAR) since HAR focuses solely on individual activities like walking or running, while HIR involves the interactions between people. This research aims to develop a robust system for recognizing five common human interactions, such as hugging, kicking, pushing, pointing, and no interaction, from video sequences using multiple cameras. In this study, a hybrid Deep Learning (DL) and Machine Learning (ML) model was employed to improve classification accuracy and generalizability. The dataset was collected in an indoor environment with four-channel cameras capturing the five types of interactions among 13 participants. The data was processed using a DL model with a fine-tuned ResNet (Residual Networks) architecture based on 2D Convolutional Neural Network (CNN) layers for feature extraction. Subsequently, machine learning models were trained and utilized for interaction classification using six commonly used ML algorithms, including SVM, KNN, RF, DT, NB, and XGBoost. The results demonstrate a high accuracy of 95.45% in classifying human interactions. The hybrid approach enabled effective learning, resulting in highly accurate performance across different interaction types. Future work will explore more complex scenarios involving individuals based on the of this architecture.
引用
收藏
页码:773 / 787
页数:15
相关论文
共 37 条
[1]  
Ahad Md Atiqur Rahman, 2020, International Journal of Biomedical Soft Computing and Human Sciences, V25, P39, DOI 10.24466/ijbschs.25.2_39
[2]   Hand Gestures Recognition Using Radar Sensors for Human-Computer-Interaction: A Review [J].
Ahmed, Shahzad ;
Kallu, Karam Dad ;
Ahmed, Sarfaraz ;
Cho, Sung Ho .
REMOTE SENSING, 2021, 13 (03) :1-24
[3]   UniPose: Unified Human Pose Estimation in Single Images and Videos [J].
Artacho, Bruno ;
Savakis, Andreas .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7033-7042
[4]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[5]   2D Human pose estimation: a survey [J].
Chen, Haoming ;
Feng, Runyang ;
Wu, Sifan ;
Xu, Hao ;
Zhou, Fengcheng ;
Liu, Zhenguang .
MULTIMEDIA SYSTEMS, 2023, 29 (05) :3115-3138
[6]   Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling [J].
Hachiume, Ryo ;
Sato, Fumiaki ;
Sekii, Taiki .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :22962-22971
[7]   A Multi-Stream Sequence Learning Framework for Human Interaction Recognition [J].
Haroon, Umair ;
Ullah, Amin ;
Hussain, Tanveer ;
Ullah, Waseem ;
Sajjad, Muhammad ;
Muhammad, Khan ;
Lee, Mi Young ;
Baik, Sung Wook .
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2022, 52 (03) :435-444
[8]   Automatic Recognition of Human Interaction via Hybrid Descriptors and Maximum Entropy Markov Model Using Depth Sensors [J].
Jalal, Ahmad ;
Khalid, Nida ;
Kim, Kibum .
ENTROPY, 2020, 22 (08)
[9]   Human action recognition using fusion of multiview and deep features: an application to video surveillance [J].
Khan, Muhammad Attique ;
Javed, Kashif ;
Khan, Sajid Ali ;
Saba, Tanzila ;
Habib, Usman ;
Khan, Junaid Ali ;
Abbasi, Aaqif Afzaal .
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) :14885-14911
[10]   Enhanced Action Recognition Using Multiple Stream Deep Learning with Optical Flow and Weighted Sum [J].
Kim, Hyunwoo ;
Park, Seokmok ;
Park, Hyeokjin ;
Paik, Joonki .
SENSORS, 2020, 20 (14)