ASMGCN: Attention-Based Semantic-Guided Multistream Graph Convolution Network for Skeleton Action Recognition

被引:2
作者
Zhang, Moyan [1 ]
Quan, Zhenzhen [1 ]
Wang, Wei [1 ]
Chen, Zhe [1 ]
Guo, Xiaoshan [1 ]
Li, Yujun [1 ]
机构
[1] Shandong Univ, Sch Informat Sci & Engn, Qingdao 266237, Shandong, Peoples R China
关键词
Skeleton; Feature extraction; Convolution; Sensors; Data models; Data mining; Joints; Attention mechanism; graph convolution network (GCN); multistream network; skeleton-based action recognition;
D O I
10.1109/JSEN.2024.3388154
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, the field of action recognition using spatio-temporal graph convolution models for human skeletal data has made significant progress. However, current methodologies tend to prioritize spatial graph convolution, which leads to an underutilization of valuable information present in skeletal data. It limits the model's ability to effectively capture complex data patterns, especially in time-series data, ultimately impacting recognition accuracy significantly. To address the above issues, this article introduces an attention-based semantic-guided multistream graph convolution network (ASMGCN), which can extract the deep features in skeletal data more fully. Specifically, ASMGCN incorporates a novel temporal convolutional module featuring an attention mechanism and a multiscale residual network, which can dynamically adjust the weights between skeleton graphs at different time points, enabling better capture of relational features. In addition, semantic information is introduced into the loss function, enhancing the model's ability to distinguish similar actions. Furthermore, the coordinate information of different joints within the same frame is explored to generate new relative position features known as centripetal and centrifugal streams based on the center of gravity. These features are integrated with the original position and motion features of skeleton, including joints and bones, enriching the inputs to the GCN. Experimental results on the NW-UCLA, NTU RGB + D (NTU60), and NTU RGB + D 120 (NTU120) datasets demonstrate that ASMGCN outperforms other state-of-the-art (SOTA) human action recognition (HAR) methods, signifying its potential in advancing the field of action recognition using skeletal data.
引用
收藏
页码:20064 / 20075
页数:12
相关论文
共 63 条
[1]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[2]   Hierarchical Posture Representation for Robust Action Recognition [J].
Chen, Yi ;
Yu, Li ;
Ota, Kaoru ;
Dong, Mianxiong .
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2019, 6 (05) :1115-1125
[3]   Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition [J].
Chen, Yuxin ;
Zhang, Ziqi ;
Yuan, Chunfeng ;
Li, Bing ;
Deng, Ying ;
Hu, Weiming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13339-13348
[4]   Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition [J].
Chen, Tailin ;
Zhou, Desen ;
Wang, Jian ;
Wang, Shidong ;
Guan, Yu ;
He, Xuming ;
Ding, Errui .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :4334-4342
[5]   Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition [J].
Cheng, Ke ;
Zhang, Yifan ;
Cao, Congqi ;
Shi, Lei ;
Cheng, Jian ;
Lu, Hanqing .
COMPUTER VISION - ECCV 2020, PT XXIV, 2020, 12369 :536-553
[6]   Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].
Cheng, Ke ;
Zhang, Yifan ;
He, Xiangyu ;
Chen, Weihan ;
Cheng, Jian ;
Lu, Hanqing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189
[7]   InfoGCN: Representation Learning for Human Skeleton-based Action Recognition [J].
Chi, Hyung-gun ;
Ha, Myoung Hoon ;
Chi, Seunggeun ;
Lee, Sang Wan ;
Huang, Qixing ;
Ramani, Karthik .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20154-20164
[8]   Skeleton-Based Multifeatures and Multistream Network for Real-Time Action Recognition [J].
Deng, Zhiwen ;
Gao, Qing ;
Ju, Zhaojie ;
Yu, Xiang .
IEEE SENSORS JOURNAL, 2023, 23 (07) :7397-7409
[9]  
Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[10]   Visual attention network [J].
Guo, Meng-Hao ;
Lu, Cheng-Ze ;
Liu, Zheng-Ning ;
Cheng, Ming-Ming ;
Hu, Shi-Min .
COMPUTATIONAL VISUAL MEDIA, 2023, 9 (04) :733-752