Skeleton-Based Action Recognition Using Multibranch Adaptive Graph Convolutional Network With Pose Refinement

被引:0
作者
Chen, Luefeng [1 ,2 ,3 ]
Li, Jiazhuo [1 ,2 ,3 ]
Li, Min [1 ,2 ,3 ]
Wu, Min [1 ,2 ,3 ]
Pedrycz, Witold [4 ,5 ,6 ]
Hirota, Kaoru [7 ]
机构
[1] China Univ Geosci, Sch Automat, Wuhan 430074, Peoples R China
[2] Hubei Key Lab Adv Control & Intelligent Automat Co, Wuhan 430074, Peoples R China
[3] Minist Educ, Engn Res Ctr Intelligent Technol Geoexplorat, Wuhan 430074, Peoples R China
[4] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2R3, Canada
[5] Macau Univ Sci & Technol, Inst Syst Engn, Taipa 999078, Macau, Peoples R China
[6] Istinye Univ, Res Ctr Performance & Prod Anal, TR-34010 Istanbul, Turkiye
[7] Tokyo Inst Technol, Yokohama 2268502, Japan
基金
中国国家自然科学基金;
关键词
Skeleton; Feature extraction; Adaptation models; Vectors; Convolution; Human activity recognition; Graph convolutional networks; Attention mechanisms; Adaptive systems; Accuracy; Action recognition; adaptive; pose refinement; skeleton based;
D O I
10.1109/TCSS.2025.3566733
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A multibranch adaptive graph convolutional network is proposed for human action recognition by combining graph convolutional networks (GCNs), adaptive learning, and multibranch feature extraction. Through the adaptive graph convolution module, this method can adaptively change parameters during the training process, thereby enhancing the flexibility of the model. Furthermore, the integration of shallow-level features (skeleton joints), with deep-level features including skeleton information, motion information, and motion difference information allows our model to capture both spatial and temporal dynamics of human actions, leading to a more comprehensive representation of human action features. The introduction of the spatio-temporal attention mechanism enables our model to focus on key frames and skeleton joints. The attitude correction module makes the input data to the network more reasonable and reduces the interference of noise. The inclusion of the adaptive mechanism makes the network no longer limited to the inherent physical connections, and the flexibility of the network is enhanced. The addition of second-order features makes the features of the skeletal data fully exploited. This attention mechanism enhances the discriminative ability of the model and improves its ability to recognize subtle variations and important cues in human actions. Through experiments on benchmark datasets, NTU-RGB-D and Kinetics-400, our method achieves significant improvements in action recognition performance compared with existing approaches. On the Kinetics-400 dataset, we achieved 36.5% and 59.6% recognition rates under the Top-1 and Top-5 evaluation metrics, respectively, which is an improvement of about 1% compared with the state-of-the-art method. On the NTU-RGB-D dataset, we achieved 95.8% and 89.4% recognition rates under the X-view and X-subject modes, respectively, with excellent results. These results validate the effectiveness of the multi-branch adaptive graph convolutional network for human action recognition tasks.
引用
收藏
页数:11
相关论文
共 41 条
[1]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[2]   Fusion sampling networks for skeleton-based human action recognition [J].
Chen, Guannan ;
Wei, Shimin .
JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (05)
[3]   Hierarchical Posture Representation for Robust Action Recognition [J].
Chen, Yi ;
Yu, Li ;
Ota, Kaoru ;
Dong, Mianxiong .
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2019, 6 (05) :1115-1125
[4]   InfoGCN: Representation Learning for Human Skeleton-based Action Recognition [J].
Chi, Hyung-gun ;
Ha, Myoung Hoon ;
Chi, Seunggeun ;
Lee, Sang Wan ;
Huang, Qixing ;
Ramani, Karthik .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20154-20164
[5]   Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images [J].
Dantone, Matthias ;
Gall, Juergen ;
Leistner, Christian ;
Van Gool, Luc .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (11) :2131-2143
[6]   Guest Editorial: Special Section on Cyber-Physical Social Systems-Integrating Human into Computing [J].
Dong, Mianxiong ;
Ansari, Nirwan .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2020, 8 (01) :4-5
[7]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[8]   Holistic-Based Cross-Attention Modal Fusion Network for Video Sign Language Recognition [J].
Gao, Qing ;
Hu, Jing ;
Mai, Haixing ;
Ju, Zhaojie .
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
[9]  
Gao X, 2019, Arxiv, DOI arXiv:1811.12013
[10]   A Multitemporal Scale and Spatial-Temporal Transformer Network for Temporal Action Localization [J].
Gao, Zan ;
Cui, Xinglei ;
Zhuo, Tao ;
Cheng, Zhiyong ;
Liu, An-An ;
Wang, Meng ;
Chen, Shenyong .
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2023, 53 (03) :569-580