Focusing Fine-Grained Action by Self-Attention-Enhanced Graph Neural Networks With Contrastive Learning

被引:21
作者
Geng, Pei [1 ,2 ]
Lu, Xuequan [3 ]
Hu, Chunyu [4 ]
Liu, Hong [1 ,2 ]
Lyu, Lei [1 ,2 ]
机构
[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250000, Peoples R China
[2] Shandong Prov Key Lab Novel Distributed Comp Softw, Jinan 250000, Peoples R China
[3] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia
[4] Qilu Univ Technol, Shandong Acad Sci, Sch Comp Sci & Technol, Jinan 250353, Peoples R China
基金
中国国家自然科学基金;
关键词
Joints; Convolution; Feature extraction; Bones; Graph neural networks; Data mining; Correlation; Skeleton-based action recognition; graph neural network; attention mechanism; contrastive learning; HUMAN ACTION RECOGNITION; CONVOLUTION NETWORK;
D O I
10.1109/TCSVT.2023.3248782
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the aid of graph convolution neural network and transformer model, human action recognition has achieved significant performance based on skeleton data. However, the majority of existing works rarely focus on identifying fine-grained motion information (i.e., "read", "write", etc.). Furthermore, they tend to explore correlations between joints and bones ignoring the angular information. Consequently, the recognition accuracy for fine-grained actions with most models is still less desired. To address this issue, we first attempt to bring angular information as a complement to familiar joint and bone information, while learning the potential dependencies of the three kinds of information using graph neural networks. Based on this, we propose a self-attention-enhanced graph neural network (SAE-GNN), which consists of a kernel-unified graph convolution (KUGC) module and an enhanced attention graph convolution (EAGC) module. The KUGC module is devised to effectively extract rich features in the skeleton information. The EAGC consisting of a multi-scale enhanced graph convolution block and a multi-headed self-attention block is designed to learn the potential high-level semantic information in the features. Besides, we introduce contrastive learning in the two blocks to enhance feature representation by maximizing their mutual information. We conduct extensive experiments on four publicly available datasets, and results show that our model outperforms state-of-the-art methods in recognizing fine-grained actions.
引用
收藏
页码:4754 / 4768
页数:15
相关论文
共 66 条
[1]  
Bertasius G, 2021, PR MACH LEARN RES, V139
[2]  
Chao Li, 2017, 2017 IEEE International Conference on Multimedia and Expo: Workshops (ICMEW), P609, DOI 10.1109/ICMEW.2017.8026281
[3]   Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition [J].
Chen, Yuxin ;
Zhang, Ziqi ;
Yuan, Chunfeng ;
Li, Bing ;
Deng, Ying ;
Hu, Weiming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13339-13348
[4]   InfoGCN: Representation Learning for Human Skeleton-based Action Recognition [J].
Chi, Hyung-gun ;
Ha, Myoung Hoon ;
Chi, Seunggeun ;
Lee, Sang Wan ;
Huang, Qixing ;
Ramani, Karthik .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20154-20164
[5]  
Du Y, 2015, PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, P579, DOI 10.1109/ACPR.2015.7486569
[6]  
Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[7]   Revisiting Skeleton-based Action Recognition [J].
Duan, Haodong ;
Zhao, Yue ;
Chen, Kai ;
Lin, Dahua ;
Dai, Bo .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :2959-2968
[8]  
Gaur U, 2011, IEEE I CONF COMP VIS, P2595, DOI 10.1109/ICCV.2011.6126548
[9]   Adaptive multi-level graph convolution with contrastive learning for skeleton-based action recognition [J].
Geng, Pei ;
Li, Haowei ;
Wang, Fuyun ;
Lyu, Lei .
SIGNAL PROCESSING, 2022, 201
[10]  
Hou R., 2021, arXiv