Skeleton-based Human Action Recognition via Large-kernel Attention Graph Convolutional Network

被引：64

作者：

Liu, Yanan ^{[1
]}

Zhang, Hao ^{[1
]}

Li, Yanqiu ^{[1
]}

He, Kangjian ^{[1
]}

Xu, Dan ^{[1
]}

机构：

[1] Yunnan Univ, Sch Informat Sci & Engn, Kunming, Peoples R China

来源：

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS | 2023年 / 29卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Skeleton; Convolution; Kernel; Adaptation models; Joints; Topology; Task analysis; human skeleton; action recognition; large kernels; graph convolution;

D O I：

10.1109/TVCG.2023.3247075

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The skeleton-based human action recognition has broad application prospects in the field of virtual reality, as skeleton data is more resistant to data noise such as background interference and camera angle changes. Notably, recent works treat the human skeleton as a non-grid representation, e.g., skeleton graph, then learns the spatio-temporal pattern via graph convolution operators. Still, the stacked graph convolution plays a marginal role in modeling long-range dependences that may contain crucial action semantic cues. In this work, we introduce a skeleton large kernel attention operator (SLKA), which can enlarge the receptive field and improve channel adaptability without increasing too much computational burden. Then a spatiotemporal SLKA module (ST-SLKA) is integrated, which can aggregate long-range spatial features and learn long-distance temporal correlations. Further, we have designed a novel skeleton-based action recognition network architecture called the spatiotemporal large-kernel attention graph convolution network (LKA-GCN). In addition, large-movement frames may carry significant action information. This work proposes a joint movement modeling strategy (JMM) to focus on valuable temporal interactions. Ultimately, on the NTU-RGBD 60, NTU-RGBD 120 and Kinetics-Skeleton 400 action datasets, the performance of our LKA-GCN has achieved a state-of-the-art level.

引用

页码：2575 / 2585

页数：11

共 51 条

[1] OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields [J].

Cao, Zhe ;

Hidalgo, Gines ;

Simon, Tomas ;

Wei, Shih-En ;

Sheikh, Yaser .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) :172-186

[2]

Chao Li, 2017, 2017 IEEE International Conference on Multimedia and Expo: Workshops (ICMEW), P609, DOI 10.1109/ICMEW.2017.8026281

[3]

Chen D., 2019, NATL C ARTIFICIAL IN

[4] Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].

Cheng, Ke ;

Zhang, Yifan ;

He, Xiangyu ;

Chen, Weihan ;

Cheng, Jian ;

Lu, Hanqing .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189

[5]

Ciftci UA, 2017, IEEE INT CON MULTI, P715, DOI 10.1109/ICME.2017.8019545

[6] Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [J].

Ding, Xiaohan ;

Zhang, Xiangyu ;

Han, Jungong ;

Ding, Guiguang .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11953-11965

[7]

Duan HD, 2022, Arxiv, DOI [arXiv:2104.13586, DOI 10.48550/ARXIV.2104.13586]

[8] Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition [J].

Ellis, Chris ;

Masood, Syed Zain ;

Tappen, Marshall F. ;

LaViola, Joseph J., Jr. ;

Sukthankar, Rahul .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2013, 101 (03) :420-436

[9] Skeletal Quads: Human Action Recognition Using Joint Quadruples [J].

Evangelidis, Georgios ;

Singh, Gurkirt ;

Horaud, Radu .

2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, :4513-4518

[10]

Gao X., 2018, ACM MULTIMEDIA

← 1 2 3 4 5 6 →