Constructing Stronger and Faster Baselines for Skeleton-Based Action Recognition

被引:250
|
作者
Song, Yi-Fan [1 ,2 ]
Zhang, Zhang [1 ,2 ]
Shan, Caifeng [3 ,4 ]
Wang, Liang [1 ,2 ]
机构
[1] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing 100190, Peoples R China
[2] Chinese Acad Sci CASIA, Inst Automat, Ctr Res Intelligent Percept & Comp CRIPAC, Natl Lab Pattern Recognit NLPR, Beijing 100190, Peoples R China
[3] Shandong Univ Sci & Technol SDUST, Coll Elect Engn & Automation, Qingdao 266590, Peoples R China
[4] Chinese Acad Sci CAS AIR, Artificial Intelligence Res, Beijing 100190, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Action recognition; skeleton sequence; graph convolutional network; EfficientNet; separable convolution; PERSON REIDENTIFICATION;
D O I
10.1109/TPAMI.2022.3157033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One essential problem in skeleton-based action recognition is how to extract discriminative features over all skeleton joints. However, the complexity of the recent State-Of-The-Art (SOTA) models for this task tends to be exceedingly sophisticated and over-parameterized. The low efficiency in model training and inference has increased the validation costs of model architectures in large-scale datasets. To address the above issue, recent advanced separable convolutional layers are embedded into an early fused Multiple Input Branches (MIB) network, constructing an efficient Graph Convolutional Network (GCN) baseline for skeleton-based action recognition. In addition, based on such the baseline, we design a compound scaling strategy to expand the model's width and depth synchronously, and eventually obtain a family of efficient GCN baselines with high accuracies and small amounts of trainable parameters, termed EfficientGCN-Bx, where "x " denotes the scaling coefficient. On two large-scale datasets, i.e., NTU RGB+D 60 and 120, the proposed EfficientGCN-B4 baseline outperforms other SOTA methods, e.g., achieving 92.1% accuracy on the cross-subject benchmark of NTU 60 dataset, while being 5.82x smaller and 5.85x faster than MS-G3D, which is one of the SOTA methods. The source code in PyTorch version and the pretrained models are available at https://github.com/yfsong0709/EfficientGCNv1.
引用
收藏
页码:1474 / 1488
页数:15
相关论文
共 50 条
  • [1] Stronger, Faster and More Explainable: A Graph Convolutional Baseline for Skeleton-based Action Recognition
    Song, Yi-Fan
    Zhang, Zhang
    Shan, Caifeng
    Wang, Liang
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1625 - 1633
  • [2] Multi-channel network: Constructing efficient GCN baselines for skeleton-based action recognition
    Hou, Ruijie
    Wang, Zhihao
    Ren, Ruimin
    Cao, Yang
    Wang, Zhao
    COMPUTERS & GRAPHICS-UK, 2023, 110 : 111 - 117
  • [3] Revisiting Skeleton-based Action Recognition
    Duan, Haodong
    Zhao, Yue
    Chen, Kai
    Lin, Dahua
    Dai, Bo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2959 - 2968
  • [4] Self-Constructing Temporal Excitation Graph for Skeleton-Based Action Recognition
    Li, Jianan
    Zhao, Zhifu
    Yang, Jiawen
    Chu, Hua
    Li, Qingshan
    IEEE SENSORS JOURNAL, 2023, 23 (19) : 23079 - 23091
  • [5] RELATIONAL NETWORK FOR SKELETON-BASED ACTION RECOGNITION
    Zheng, Wu
    Li, Lin
    Zhang, Zhaoxiang
    Huang, Yan
    Wang, Liang
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 826 - 831
  • [6] SpatioTemporal focus for skeleton-based action recognition
    Wu, Liyu
    Zhang, Can
    Zou, Yuexian
    PATTERN RECOGNITION, 2023, 136
  • [7] Generative Action Description Prompts for Skeleton-based Action Recognition
    Xiang, Wangmeng
    Li, Chao
    Zhou, Yuxuan
    Wang, Biao
    Zhang, Lei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10242 - 10251
  • [8] A Novel Skeleton Spatial Pyramid Model for Skeleton-based Action Recognition
    Li, Yanshan
    Guo, Tianyu
    Xia, Rongjie
    Liu, Xing
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 16 - 20
  • [9] Skeleton MixFormer: Multivariate Topology Representation for Skeleton-based Action Recognition
    Xin, Wentian
    Miao, Qiguang
    Liu, Yi
    Liu, Ruyi
    Pun, Chi-Man
    Shi, Cheng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2211 - 2220
  • [10] Fully Attentional Network for Skeleton-Based Action Recognition
    Liu, Caifeng
    Zhou, Hongcheng
    IEEE ACCESS, 2023, 11 : 20478 - 20485