Spatio-Temporal Transformer with Kolmogorov-Arnold Network for Skeleton-Based Hand Gesture Recognition

被引：0

作者：

Han, Pengcheng ^{[1
]}

He, Xin ^{[1
]}

Matsumaru, Takafumi ^{[1
]}

Dutta, Vibekananda ^{[2
,3
]}

机构：

[1] Waseda Univ, Grad Sch Informat Prod & Syst, Kitakyushu 8080135, Japan

[2] Warsaw Univ Technol, Inst Micromech & Photon, Fac Mechatron, PL-00661 Warsaw, Poland

[3] Waseda Univ, Waseda Inst Adv Study, Tokyo 1698050, Japan

来源：

SENSORS | 2025年 / 25卷 / 03期

关键词：

hand gesture recognition; human-computer interaction (HCI); skeleton based; deep learning; graph convolutional networks; transformer; attention mechanism; feature extraction; continuous hand gesture recognition;

D O I：

10.3390/s25030702

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Manually crafted features often suffer from being subjective, having an inadequate accuracy, or lacking in robustness in recognition. Meanwhile, existing deep learning methods often overlook the structural and dynamic characteristics of the human hand, failing to fully explore the contextual information of joints in both the spatial and temporal domains. To effectively capture dependencies between the hand joints that are not adjacent but may have potential connections, it is essential to learn long-term relationships. This study proposes a skeleton-based hand gesture recognition framework, the ST-KT, a spatio-temporal graph convolution network, and a transformer with the Kolmogorov-Arnold Network (KAN) model. It incorporates spatio-temporal graph convolution network (ST-GCN) modules and a spatio-temporal transformer module with KAN (KAN-Transformer). ST-GCN modules, which include a spatial graph convolution network (SGCN) and a temporal convolution network (TCN), extract primary features from skeleton sequences by leveraging the strength of graph convolutional networks in the spatio-temporal domain. A spatio-temporal position embedding method integrates node features, enriching representations by including node identities and temporal information. The transformer layer includes a spatial KAN-Transformer (S-KT) and a temporal KAN-Transformer (T-KT), which further extract joint features by learning edge weights and node embeddings, providing richer feature representations and the capability for nonlinear modeling. We evaluated the performance of our method on two challenging skeleton-based dynamic gesture datasets: our method achieved an accuracy of 97.5% on the SHREC'17 track dataset and 94.3% on the DHG-14/28 dataset. These results demonstrate that our proposed method, ST-KT, effectively captures dynamic skeleton changes and complex joint relationships.

引用

页数：23

共 50 条

[1] Spatial--Temporal Synchronous Transformer for Skeleton-Based Hand Gesture Recognition [J].

Zhao, Dongdong ;

Li, Hongli ;

Yan, Shi .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) :1403-1412

[2] Recognizing Skeleton-Based Hand Gestures by a Spatio-Temporal Network [J].

Li, Xin ;

Liao, Jun ;

Liu, Li .

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: APPLIED DATA SCIENCE TRACK, PT IV, 2021, 12978 :151-167

[3] Decoupled spatio-temporal grouping transformer for skeleton-based action recognition [J].

Sun, Shengkun ;

Jia, Zihao ;

Zhu, Yisheng ;

Liu, Guangcan ;

Yu, Zhengtao .

VISUAL COMPUTER, 2024, 40 (08) :5733-5745

[4] SPD Siamese Neural Network for Skeleton-based Hand Gesture Recognition [J].

Akremi, Mohamed Sanim ;

Slama, Rim ;

Tabia, Hedi .

PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 4, 2022, :394-402

[5] Two-stream spatio-temporal GCN-transformer networks for skeleton-based action recognition [J].

Chen, Dong ;

Chen, Mingdong ;

Wu, Peisong ;

Wu, Mengtao ;

Zhang, Tao ;

Li, Chuanqi .

SCIENTIFIC REPORTS, 2025, 15 (01)

[6] Glimpse and Zoom: Spatio-Temporal Focused Dynamic Network for Skeleton-Based Action Recognition [J].

Zhao, Zhifu ;

Chen, Ziwei ;

Li, Jianan ;

Wang, Xiaotian ;

Xie, Xuemei ;

Huang, Lei ;

Zhang, Wanxin ;

Shi, Guangming .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) :5616-5629

[7] Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition [J].

Zheng, Zhiyun ;

Yuan, Qilong ;

Zhang, Huaizhu ;

Wang, Yizhou ;

Wang, Junfeng .

BIG DATA MINING AND ANALYTICS, 2025, 8 (02) :310-325

[8] Spatio-Temporal Dynamic Attention Graph Convolutional Network Based on Skeleton Gesture Recognition [J].

Han, Xiaowei ;

Cui, Ying ;

Chen, Xingyu ;

Lu, Yunjing ;

Hu, Wen .

ELECTRONICS, 2024, 13 (18)

[9] Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition [J].

Li, Yong ;

He, Zihang ;

Ye, Xiang ;

He, Zuguo ;

Han, Kangrong .

EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2019, 2019 (01)

[10] Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition [J].

Yong Li ;

Zihang He ;

Xiang Ye ;

Zuguo He ;

Kangrong Han .

EURASIP Journal on Image and Video Processing, 2019

← 1 2 3 4 5 →