Dynamic Semantic-Based Spatial-Temporal Graph Convolution Network for Skeleton-Based Human Action Recognition

被引：0

作者：

Xie, Jianyang ^{[1
]}

Meng, Yanda ^{[2
]}

Zhao, Yitian ^{[3
]}

Nguyen, Anh ^{[4
]}

Yang, Xiaoyun ^{[5
]}

Zheng, Yalin ^{[2
]}

机构：

[1] Univ Liverpool, Sch EEECS, CDT Distributed Algorithms, Liverpool L7 8TS, England

[2] Univ Liverpool, Dept Eye & Vis Sci, Liverpool L7 8TS, England

[3] Ningbo Inst Ind Technol, Ningbo 312501, Peoples R China

[4] Univ Liverpool, Dept Comp Sci, Liverpool L7 8TS, England

[5] Remark Holdings, Las Vegas, NV 89106 USA

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

基金：

英国工程与自然科学研究理事会;

关键词：

Semantics; Skeleton; Convolution; Human activity recognition; Graph neural networks; Encoding; Adaptation models; Legged locomotion; Joints; Image coding; Human action recognition; skeleton-based; semantics encoding; joints/edge type; frames occurrence order; graph convolution network;

D O I：

10.1109/TIP.2024.3497837

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human action recognition is an essential topic in computer vision and image processing. Graph convolutional networks (GCNs) have attracted significant attention and achieved noteworthy performance in skeleton-based human action recognition tasks. However, most of the previous graph-based works are designed to refine skeleton topology without considering the types of different joints and edges and the occurrence order of the frames. Such a limitation makes them insufficient to represent intrinsic semantic information. Differently, we proposed a dynamic semantic-based spatial-temporal graph convolution network (DS-STGCN) to address the challenge. DS-STGCN has two dynamic semantic modules for spatial and temporal contexts respectively. Specifically, the joints and edge types were encoded in the spatial module implicitly, and the occurrence order of frames was encoded in the temporal module implicitly. Extensive experiments on four datasets including NTU-RGB+D 60(120), Kinetics-400, and FineGYM show that our proposed two semantic modules can bring consistent recognition performance improvement with various backbones. Meanwhile, the proposed DS-STGCN notably surpassed state-of-the-art methods on these datasets. Notably, in the more challenging dataset, such as Kinetics-400, our model significantly outperformed other state-of-the-art GCN-based methods by a large margin. The code has been released at https://github.com/davelailai/DS-STGCN.

引用

页码：6691 / 6704

页数：14

共 59 条

[1] STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
Ahn, Dasom
Kim, Sangwon
Hong, Hyunsu
Ko, Byoung Chul
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3319 - 3328
[2] Action Recognition with Dynamic Image Networks
Bilen, Hakan
Fernando, Basura
Gavves, Efstratios
Vedaldi, Andrea
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (12) : 2799 - 2813
[3] SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition
Caetano, Carlos
Sena, Jessica
Bremond, Francois
dos Santos, Jefersson A.
Schwartz, William Robson
[J]. 2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
[4] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Cao, Zhe
Simon, Tomas
Wei, Shih-En
Sheikh, Yaser
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1302 - 1310
[5] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[6] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
Chen, Yuxin
Zhang, Ziqi
Yuan, Chunfeng
Li, Bing
Deng, Ying
Hu, Weiming
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13339 - 13348
[7] Skeleton-Based Action Recognition with Shift Graph Convolutional Network
Cheng, Ke
Zhang, Yifan
He, Xiangyu
Chen, Weihan
Cheng, Jian
Lu, Hanqing
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 180 - 189
[8] InfoGCN: Representation Learning for Human Skeleton-based Action Recognition
Chi, Hyung-gun
Ha, Myoung Hoon
Chi, Seunggeun
Lee, Sang Wan
Huang, Qixing
Ramani, Karthik
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20154 - 20164
[9] Das S, 2020, ECCV, P72, DOI DOI 10.1007/978-3-030-58545-7_5
[10] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497

← 1 2 3 4 5 6 →