Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network

被引:7
|
作者
Zhang, Haiping [1 ,2 ]
Liu, Xu [3 ]
Yu, Dongjin [1 ]
Guan, Liming [2 ]
Wang, Dongjing [1 ]
Ma, Conghao [3 ]
Hu, Zepeng [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Zhejiang, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Elect & Informat, Hangzhou 310018, Zhejiang, Peoples R China
关键词
Action recognition; Skeleton; GCN; Multi-stream network; VISUAL SURVEILLANCE;
D O I
10.1007/s10489-022-04365-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition techniques based on skeleton data are receiving more and more attention in the field of computer vision due to their ability to adapt to dynamic environments and complex backgrounds. Topologizing human skeleton data as spatial-temporal graphs and processing them using graph convolutional networks (GCNs) has been shown to produce good recognition results. However, with existing GCN methods, a fixed-size convolution kernel is often used to extract time-domain features, which may not be very suitable for multi-level model structures. Equal proportion fusion of different streams in a multi-stream network may ignore the difference in recognition ability of different streams, and these will affect the final recognition result. In this paper, we are proposing (1) a multi-scale dilated temporal graph convolution layer (MDTGCL) and (2) a multi-branch feature fusion (MFF) structure. The MDTGCL utilizes multiple convolution kernels and dilated convolution to better adapt to the multi-layer structure of the GCN model and to obtain longer periods of contextual spatial-temporal information, resulting in richer behavioural features. MFF entails weighted fusion based on the results of multi-stream outputs, and this is used to obtain the final recognition results. As higher-order skeleton data are highly discriminative and more conducive to human action recognition, we used spatial information on joints and bones and their multiple motion, as well as angle information pertaining to bones, to model together in this study. By combining the above, we designed a multi-stream, multi-scale dilated spatial-temporal graph convolutional network (2M-STGCN) model and conducted extensive experiments with two large datasets (NTU RGB+D 60 and Kinetics Skeleton 400), which showed that our model performs at SOTA level.
引用
收藏
页码:17629 / 17643
页数:15
相关论文
共 50 条
  • [31] MS-GTR: Multi-stream Graph Transformer for Skeleton-Based Action Recognition
    Zhao, Weichao
    Peng, Jingliang
    Lv, Na
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT III, 2024, 14497 : 104 - 118
  • [32] Dynamic Semantic-Based Spatial-Temporal Graph Convolution Network for Skeleton-Based Human Action Recognition
    Xie, Jianyang
    Meng, Yanda
    Zhao, Yitian
    Nguyen, Anh
    Yang, Xiaoyun
    Zheng, Yalin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6691 - 6704
  • [33] Multi-Scale Adaptive Aggregate Graph Convolutional Network for Skeleton-Based Action Recognition
    Zheng, Zhiyun
    Wang, Yizhou
    Zhang, Xingjin
    Wang, Junfeng
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [34] Skeleton Action Recognition Based on Multi-Stream Spatial Attention Graph Convolutional SRU Network
    Zhao J.-N.
    She Q.-S.
    Meng M.
    Chen Y.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (07): : 1579 - 1585
  • [35] Spatial-Temporal gated graph attention network for skeleton-based action recognition
    Rahevar, Mrugendrasinh
    Ganatra, Amit
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 929 - 939
  • [36] MSA-GCN: Exploiting Multi-Scale Temporal Dynamics With Adaptive Graph Convolution for Skeleton-Based Action Recognition
    Alowonou, Kowovi Comivi
    Han, Ji-Hyeong
    IEEE ACCESS, 2024, 12 : 193552 - 193563
  • [37] Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition
    Rahevar, Mrugendrasinh
    Ganatra, Amit
    Saba, Tanzila
    Rehman, Amjad
    Bahaj, Saeed Ali
    IEEE ACCESS, 2023, 11 : 21546 - 21553
  • [38] Spatial-temporal slowfast graph convolutional network for skeleton-based action recognition
    Fang, Zheng
    Zhang, Xiongwei
    Cao, Tieyong
    Zheng, Yunfei
    Sun, Meng
    IET COMPUTER VISION, 2022, 16 (03) : 205 - 217
  • [39] Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition
    Hang, Rui
    Li, MinXian
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 172 - 188
  • [40] Dynamic spatial-temporal topology graph network for skeleton-based action recognition
    Chen, Lian
    Lu, Ke
    Niu, Zehai
    Wei, Runchen
    Xue, Jian
    MULTIMEDIA SYSTEMS, 2024, 30 (06)