Temporal-Channel Attention and Convolution Fusion for Skeleton-Based Human Action Recognition

被引：0

作者：

Liang, Chengwu ^{[1
,2
]}

Yang, Jie ^{[1
,2
]}

Du, Ruolin ^{[3
]}

Hu, Wei ^{[1
,2
]}

Hou, Ning ^{[1
]}

机构：

[1] Henan Univ Urban Construct, Sch Elect & Control Engn, Pingdingshan 467036, Henan, Peoples R China

[2] China Three Gorges Univ, Coll Elect Engn & New Energy, Yichang 443002, Hubei, Peoples R China

[3] Nantong Univ, Sch Transportat & Civil Engn, Nantong 226019, Jiangsu, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

中国国家自然科学基金;

关键词：

Skeleton; Feature extraction; Convolution; Heating systems; Three-dimensional displays; Stacking; Data models; Skeleton-based; action recognition; attention mechanism; convolutional neural network; multi-scale convolution;

D O I：

10.1109/ACCESS.2024.3389499

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Human Action Recognition (HAR) based on skeleton sequences has attracted much attention due to the robustness and background insensitivity of skeletal data. The convolutional neural network (CNN) for spatio-temporal representation learning has been widely utilized for skeleton-based HAR. However, the long-term spatio-temporal modeling and action category-specific feature attention have not been fully exploited. In order to explore the current potential of CNNs for skeleton-based HAR, a novel CNN architecture with temporal-channel attention and convolution fusion is proposed. Specially, the network architecture is composed of two novel modules, the Temporal-Channels Attention Module (TCA) and Multiscale Temporal Convolution Fusion module (MTCF). TCA module is designed to generate a temporal-channel attention matrix for different visual channels and temporal features, motivating the CNN to focus on the critical category-associated feature representation learning. Along the channels, MTCF module adapts the grouped residual connections to flexibly extend the convolutional temporal receptive field, without introducing additional parameters. By reverse stacking, MTCF module creates a bidirectional information interaction among inter-channels, compensating for the receptive field and information imbalance between subgroups from different branches. The proposed method was evaluated on three benchmark datasets, including NTU RGB-D, NTU RGB-D120 and FineGYM. The results show that the proposed TCA-MTCF method improves the CNNs' ability to model long-term temporal features of skeleton sequences, achieving the state-of-the-art performance for HAR.

引用

页码：64937 / 64948

页数：12

共 52 条

[1] Fuzzy Integral-Based CNN Classifier Fusion for 3D Skeleton Action Recognition
Banerjee, Avinandan
Singh, Pawan Kumar
Sarkar, Ram
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2206 - 2216
[2] SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition
Caetano, Carlos
Sena, Jessica
Bremond, Francois
dos Santos, Jefersson A.
Schwartz, William Robson
[J]. 2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[4] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
Chen, Yuxin
Zhang, Ziqi
Yuan, Chunfeng
Li, Bing
Deng, Ying
Hu, Weiming
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13339 - 13348
[5] Skeleton-Based Action Recognition with Shift Graph Convolutional Network
Cheng, Ke
Zhang, Yifan
He, Xiangyu
Chen, Weihan
Cheng, Jian
Lu, Hanqing
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 180 - 189
[6] InfoGCN: Representation Learning for Human Skeleton-based Action Recognition
Chi, Hyung-gun
Ha, Myoung Hoon
Chi, Seunggeun
Lee, Sang Wan
Huang, Qixing
Ramani, Karthik
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20154 - 20164
[7] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[8] Revisiting Skeleton-based Action Recognition
Duan, Haodong
Zhao, Yue
Chen, Kai
Lin, Dahua
Dai, Bo
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2959 - 2968
[9] Improving Action Recognition via Temporal and Complementary Learning
Elmadany, Nour Eldin
He, Yifeng
Guan, Ling
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (03)
[10] SlowFast Networks for Video Recognition
Feichtenhofer, Christoph
Fan, Haoqi
Malik, Jitendra
He, Kaiming
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210

← 1 2 3 4 5 6 →