Dstsa-gcn: Advancing skeleton-based gesture recognition with semantic-aware spatio-temporal topology modeling

被引：2

作者：

Cui, Hu ^{[1
]}

Huang, Renjing ^{[2
]}

Zhang, Ruoyu ^{[3
]}

Hayama, Tessai ^{[1
]}

机构：

[1] Nagaoka Univ Technol, 1603-1 Kamitiomioka, Nagaoka, Niigata 9402188, Japan

[2] Guizhou Elect Technol Coll, Guiyang 550025, Guizhou, Peoples R China

[3] Guizhou Univ, Guiyang 550025, Guizhou, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 637卷

关键词：

Human action recognition; Gesture recognition; Graph convolution networks; Spatial-temporal model; NEURAL-NETWORK;

D O I：

10.1016/j.neucom.2025.130066

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Graph convolutional networks (GCNs) have emerged as a powerful tool for skeleton-based action and gesture recognition, thanks to their ability to model spatial and temporal dependencies in skeleton data. However, existing GCN-based methods face critical limitations: (1) they lack effective spatio-temporal topology modeling that captures dynamic variations in skeletal motion, and (2) they struggle to model multiscale structural relationships beyond local joint connectivity. To address these issues, we propose a novel framework called Dynamic Spatial-Temporal Semantic Awareness Graph Convolutional Network (DSTSA-GCN). DSTSA-GCN introduces three key modules: Group Channel-wise Graph Convolution (GC-GC), Group Temporal-wise Graph Convolution (GT-GC), and Multi-Scale Temporal Convolution (MS-TCN). GC-GC and GT-GC operate in parallel to independently model channel-specific and frame-specific correlations, enabling robust topology learning that accounts for temporal variations. Additionally, both modules employ a grouping strategy to adaptively capture multiscale structural relationships. Complementing this, MS-TCN enhances temporal modeling through group-wise temporal convolutions with diverse receptive fields. Extensive experiments demonstrate that DSTSA-GCN significantly improves the topology modeling capabilities of GCNs, achieving state-of-the-art performance on benchmark datasets for gesture and action recognition, including SHREC'17 Track, DHG-14/28, NTU-RGB+D, NTU-RGB+D-120 and NW-ULCA. The code will be publicly available https://hucui2022.github.io/dstsa_gcn/.

引用

页数：14

共 61 条

[1]

Alemi AA., 2017, PROC 5 INT C LEARN R, P1

[2]

Bo D., 2023, arXiv

[3] Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition [J].

Chen, Tailin ;

Zhou, Desen ;

Wang, Jian ;

Wang, Shidong ;

Guan, Yu ;

He, Xuming ;

Ding, Errui .

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :4334-4342

[4] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition [J].

Chen, Yuxin ;

Zhang, Ziqi ;

Yuan, Chunfeng ;

Li, Bing ;

Deng, Ying ;

Hu, Weiming .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13339-13348

[5] Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition [J].

Cheng, Ke ;

Zhang, Yifan ;

Cao, Congqi ;

Shi, Lei ;

Cheng, Jian ;

Lu, Hanqing .

COMPUTER VISION - ECCV 2020, PT XXIV, 2020, 12369 :536-553

[6] Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].

Cheng, Ke ;

Zhang, Yifan ;

He, Xiangyu ;

Chen, Weihan ;

Cheng, Jian ;

Lu, Hanqing .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189

[7] InfoGCN: Representation Learning for Human Skeleton-based Action Recognition [J].

Chi, Hyung-gun ;

Ha, Myoung Hoon ;

Chi, Seunggeun ;

Lee, Sang Wan ;

Huang, Qixing ;

Ramani, Karthik .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20154-20164

[8] Joint-Partition Group Attention for skeleton-based action recognition [J].

Cui, Hu ;

Hayama, Tessai .

SIGNAL PROCESSING, 2024, 224

[9] Spatial Graph Convolutional Networks [J].

Danel, Tomasz ;

Spurek, Przemyslaw ;

Tabor, Jacek ;

Smieja, Marek ;

Struski, Lukasz ;

Slowik, Agnieszka ;

Maziarka, Lukasz .

NEURAL INFORMATION PROCESSING, ICONIP 2020, PT V, 2021, 1333 :668-675

[10]

De Smedt Q., 2017, PROC EUROGRAPHICS WO, P33, DOI 10.2312/3dor.20171049

← 1 2 3 4 5 6 7 →