Temporal-Channel Attention and Convolution Fusion for Skeleton-Based Human Action Recognition

被引:0
作者
Liang, Chengwu [1 ,2 ]
Yang, Jie [1 ,2 ]
Du, Ruolin [3 ]
Hu, Wei [1 ,2 ]
Hou, Ning [1 ]
机构
[1] Henan Univ Urban Construct, Sch Elect & Control Engn, Pingdingshan 467036, Henan, Peoples R China
[2] China Three Gorges Univ, Coll Elect Engn & New Energy, Yichang 443002, Hubei, Peoples R China
[3] Nantong Univ, Sch Transportat & Civil Engn, Nantong 226019, Jiangsu, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
Skeleton; Feature extraction; Convolution; Heating systems; Three-dimensional displays; Stacking; Data models; Skeleton-based; action recognition; attention mechanism; convolutional neural network; multi-scale convolution;
D O I
10.1109/ACCESS.2024.3389499
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human Action Recognition (HAR) based on skeleton sequences has attracted much attention due to the robustness and background insensitivity of skeletal data. The convolutional neural network (CNN) for spatio-temporal representation learning has been widely utilized for skeleton-based HAR. However, the long-term spatio-temporal modeling and action category-specific feature attention have not been fully exploited. In order to explore the current potential of CNNs for skeleton-based HAR, a novel CNN architecture with temporal-channel attention and convolution fusion is proposed. Specially, the network architecture is composed of two novel modules, the Temporal-Channels Attention Module (TCA) and Multiscale Temporal Convolution Fusion module (MTCF). TCA module is designed to generate a temporal-channel attention matrix for different visual channels and temporal features, motivating the CNN to focus on the critical category-associated feature representation learning. Along the channels, MTCF module adapts the grouped residual connections to flexibly extend the convolutional temporal receptive field, without introducing additional parameters. By reverse stacking, MTCF module creates a bidirectional information interaction among inter-channels, compensating for the receptive field and information imbalance between subgroups from different branches. The proposed method was evaluated on three benchmark datasets, including NTU RGB-D, NTU RGB-D120 and FineGYM. The results show that the proposed TCA-MTCF method improves the CNNs' ability to model long-term temporal features of skeleton sequences, achieving the state-of-the-art performance for HAR.
引用
收藏
页码:64937 / 64948
页数:12
相关论文
共 52 条
  • [1] Fuzzy Integral-Based CNN Classifier Fusion for 3D Skeleton Action Recognition
    Banerjee, Avinandan
    Singh, Pawan Kumar
    Sarkar, Ram
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2206 - 2216
  • [2] SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition
    Caetano, Carlos
    Sena, Jessica
    Bremond, Francois
    dos Santos, Jefersson A.
    Schwartz, William Robson
    [J]. 2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [4] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
    Chen, Yuxin
    Zhang, Ziqi
    Yuan, Chunfeng
    Li, Bing
    Deng, Ying
    Hu, Weiming
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13339 - 13348
  • [5] Skeleton-Based Action Recognition with Shift Graph Convolutional Network
    Cheng, Ke
    Zhang, Yifan
    He, Xiangyu
    Chen, Weihan
    Cheng, Jian
    Lu, Hanqing
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 180 - 189
  • [6] InfoGCN: Representation Learning for Human Skeleton-based Action Recognition
    Chi, Hyung-gun
    Ha, Myoung Hoon
    Chi, Seunggeun
    Lee, Sang Wan
    Huang, Qixing
    Ramani, Karthik
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20154 - 20164
  • [7] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [8] Revisiting Skeleton-based Action Recognition
    Duan, Haodong
    Zhao, Yue
    Chen, Kai
    Lin, Dahua
    Dai, Bo
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2959 - 2968
  • [9] Improving Action Recognition via Temporal and Complementary Learning
    Elmadany, Nour Eldin
    He, Yifeng
    Guan, Ling
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (03)
  • [10] SlowFast Networks for Video Recognition
    Feichtenhofer, Christoph
    Fan, Haoqi
    Malik, Jitendra
    He, Kaiming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210