Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models

被引:1
作者
Belal, Mohammad [1 ]
Hassan, Taimur [2 ]
Ahmed, Abdelfatah [1 ]
Aljarah, Ahmad [1 ]
Alsheikh, Nael [1 ]
Hussain, Irfan [1 ]
机构
[1] Khalifa Univ Sci & Technol, Abu Dhabi, U Arab Emirates
[2] Abu Dhabi Univ, Abu Dhabi, U Arab Emirates
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024 | 2024年
关键词
SEGMENTATION;
D O I
10.1109/AVSS61716.2024.10672566
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human activity recognition is a crucial area of research that involves understanding human movements using computer and machine vision technology. Deep learning has emerged as a powerful tool for this task, with models such as Convolutional Neural Networks (CNNs) and Transformers being employed to capture various aspects of human motion. One of the key contributions of this work is the demonstration of the effectiveness of feature fusion in improving human activity recognition accuracy, which has important implications for the development of more accurate and robust activity recognition systems. This approach addresses a limitation in the field, where the performance of existing models is often limited by their inability to capture both spatial and temporal features effectively. This work presents an approach for human activity recognition using sensory data extracted from four distinct datasets: HuGaDB, PKU-MMD, LARa, and TUG. Two models, the Parameter-Optimized Multi-Stage Graph Convolutional Network (PO-MS-GCN) and a Transformer, were trained and evaluated on each dataset to calculate accuracy and F1-score. Subsequently, the features from the last layer of each model were combined and fed into a classifier. The findings prove that PO-MS-GCN outperforms state-of-the-art models in human activity recognition. Specifically, HuGaDB achieved an accuracy of 92.7% and f1-score of 95.2%, TUG achieved an accuracy of 93.2% and f1-score of 98.3%, while LARa and PKU-MMD achieved lower accuracies of 64.31% and 69%, respectively, with corresponding f1-scores of 40.63% and 48.16%. Moreover, feature fusion exceeded the PO-MS-GCN's results in PKU-MMD, LARa, and TUG datasets.
引用
收藏
页数:7
相关论文
共 24 条
[11]  
Kumar Sateesh, 2021, Unsupervised action segmentation by joint representation learning and online clustering
[12]   Temporal Convolutional Networks for Action Segmentation and Detection [J].
Lea, Colin ;
Flynn, Michael D. ;
Vidal, Rene ;
Reiter, Austin ;
Hager, Gregory D. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1003-1012
[13]  
Liu Chunhui, 2017, PKU-MMD: A large scale benchmark for continuous multi -modal human action understanding
[14]  
Liu Daochang, 2023, Diffusion action segmentation
[15]   Predicting Actions in Videos and Action-Based Segmentation Using Deep Learning [J].
Memon, Fayaz A. ;
Khan, Umair A. ;
Shaikh, Asadullah ;
Alghamdi, Abdullah ;
Kumar, Pardeep ;
Alrizq, Mesfer .
IEEE ACCESS, 2021, 9 :106918-106932
[16]   Recognition of human activity using GRU deep learning algorithm [J].
Mohsen, Saeed .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (30) :47733-47749
[17]  
Mungoli Neelesh, 2023, Adaptive feature fusion: Enhancing generalization in deep learning models
[18]   LARa: Creating a Dataset for Human Activity Recognition in Logistics Using Semantic Attributes [J].
Niemann, Friedrich ;
Reining, Christopher ;
Rueda, Fernando Moya ;
Nair, Nilah Ravi ;
Steffens, Janine Anika ;
Fink, Gernot A. ;
ten Hompel, Michael .
SENSORS, 2020, 20 (15) :1-42
[19]   THE TIMED UP AND GO - A TEST OF BASIC FUNCTIONAL MOBILITY FOR FRAIL ELDERLY PERSONS [J].
PODSIADLO, D ;
RICHARDSON, S .
JOURNAL OF THE AMERICAN GERIATRICS SOCIETY, 1991, 39 (02) :142-148
[20]   Human activity classification using deep learning based on 3D motion feature [J].
Rahayu, Endang Sri ;
Yuniarno, Eko Mulyanto ;
Purnama, I. Ketut Eddy ;
Purnomo, Mauridhi Hery .
MACHINE LEARNING WITH APPLICATIONS, 2023, 12