Human Skeleton Feature Optimizer and Adaptive Structure Enhancement Graph Convolution Network for Action Recognition

被引:19
作者
Xiong, Xin [1 ,2 ,3 ]
Min, Weidong [3 ,4 ]
Wang, Qi [5 ]
Zha, Cheng [6 ]
机构
[1] Nanchang Univ, Affiliated Hosp 1, Informat Dept, Nanchang, Peoples R China
[2] Nanchang Univ, Inst Metaverse, Nanchang 330031, Peoples R China
[3] Jiangxi Key Lab Smart City, Nanchang, Peoples R China
[4] Nanchang Univ, Inst Metaverse, Sch Math & Comp Sci, Nanchang 330047, Peoples R China
[5] Nanchang Univ, Sch Software, Nanchang 330047, Peoples R China
[6] Nanchang Univ, Sch Math & Comp Sci, Nanchang 330031, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Skeleton; Convolution; Data mining; Directed graphs; Smart cities; Kernel; Action recognition; graph convolution network; skeleton feature optimizer; graph structure mask; directed graph mapping; adaptive pooling operation; KNOWLEDGE DISTILLATION; INTERNET;
D O I
10.1109/TCSVT.2022.3201186
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Human action recognition based on the graph convolution network (GCN) is a hot topic in computer vision. Existing GCN-based methods fail to capture internal implicit information when extracting action features, thereby leading to over-smoothing in the training stage. These issues result in poor performance and inaccurate extraction of action features. To address these problems, a new GCN is constructed. In this paper, a human skeleton feature optimizer (SFO) and adaptive structure enhancement graph convolution network (ASE-GCN) for action recognition are proposed in an end-to-end manner. To obtain discriminative features, the SFO is proposed to construct a new skeleton representation for action recognition through the connection criterion, which extracts the internal implicit information of action. The action feature of the joint coordinates is extracted by graph structure mask (GSM), directed graph mapping (DGM), and adaptive pooling operation (APO) in the proposed ASE-GCN network. The GSM acts as the regularizer of skeleton structure information to strengthen the representation of the graph structure. The DGM correlates the directed graph with human motion information through kinematic principle, and the APO strengthens the global high-frequency features to alleviate over-smoothing. The proposed method achieves comparable or superior results over state-of-the-art methods when used in experiments on two large public-scale datasets, NTU-RGB+D and Kinetics.
引用
收藏
页码:342 / 353
页数:12
相关论文
共 72 条
  • [1] Structural Knowledge Distillation for Efficient Skeleton-Based Action Recognition
    Bian, Cunling
    Feng, Wei
    Wan, Liang
    Wang, Song
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2963 - 2976
  • [2] The recognition of human movement using temporal templates
    Bobick, AF
    Davis, JW
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) : 257 - 267
  • [3] Bruna J., 2014, P INT C LEARN REPR I, P1
  • [4] Busbridge D, 2019, Arxiv, DOI [arXiv:1904.05811, DOI 10.48550/ARXIV.1904.05811]
  • [5] JOLO-GCN: Mining Joint-Centered Light-Weight Information for Skeleton-Based Action Recognition
    Cai, Jinmiao
    Jiang, Nianjuan
    Han, Xiaoguang
    Jia, Kui
    Lu, Jiangbo
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2734 - 2743
  • [6] Desktop Action Recognition From First-Person Point-of-View
    Cai, Minjie
    Lu, Feng
    Gao, Yue
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (05) : 1616 - 1628
  • [7] Skeleton-Based Action Recognition With Gated Convolutional Neural Networks
    Cao, Congqi
    Lan, Cuiling
    Zhang, Yifan
    Zeng, Wenjun
    Lu, Hanqing
    Zhang, Yanning
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) : 3247 - 3257
  • [8] Person Search by Separated Modeling and A Mask-Guided Two-Stream CNN Model
    Chen, Di
    Zhang, Shanshan
    Ouyang, Wanli
    Yang, Jian
    Tai, Ying
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 4669 - 4682
  • [9] Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition
    Chen, Tailin
    Zhou, Desen
    Wang, Jian
    Wang, Shidong
    Guan, Yu
    He, Xuming
    Ding, Errui
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4334 - 4342
  • [10] Chen Y., 2021, Proceedings of the IEEE/CVF International Conference on Computer Vision, P13359