Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network

被引:7
|
作者
Zhang, Haiping [1 ,2 ]
Liu, Xu [3 ]
Yu, Dongjin [1 ]
Guan, Liming [2 ]
Wang, Dongjing [1 ]
Ma, Conghao [3 ]
Hu, Zepeng [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Zhejiang, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Elect & Informat, Hangzhou 310018, Zhejiang, Peoples R China
关键词
Action recognition; Skeleton; GCN; Multi-stream network; VISUAL SURVEILLANCE;
D O I
10.1007/s10489-022-04365-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition techniques based on skeleton data are receiving more and more attention in the field of computer vision due to their ability to adapt to dynamic environments and complex backgrounds. Topologizing human skeleton data as spatial-temporal graphs and processing them using graph convolutional networks (GCNs) has been shown to produce good recognition results. However, with existing GCN methods, a fixed-size convolution kernel is often used to extract time-domain features, which may not be very suitable for multi-level model structures. Equal proportion fusion of different streams in a multi-stream network may ignore the difference in recognition ability of different streams, and these will affect the final recognition result. In this paper, we are proposing (1) a multi-scale dilated temporal graph convolution layer (MDTGCL) and (2) a multi-branch feature fusion (MFF) structure. The MDTGCL utilizes multiple convolution kernels and dilated convolution to better adapt to the multi-layer structure of the GCN model and to obtain longer periods of contextual spatial-temporal information, resulting in richer behavioural features. MFF entails weighted fusion based on the results of multi-stream outputs, and this is used to obtain the final recognition results. As higher-order skeleton data are highly discriminative and more conducive to human action recognition, we used spatial information on joints and bones and their multiple motion, as well as angle information pertaining to bones, to model together in this study. By combining the above, we designed a multi-stream, multi-scale dilated spatial-temporal graph convolutional network (2M-STGCN) model and conducted extensive experiments with two large datasets (NTU RGB+D 60 and Kinetics Skeleton 400), which showed that our model performs at SOTA level.
引用
收藏
页码:17629 / 17643
页数:15
相关论文
共 50 条
  • [1] Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network
    Haiping Zhang
    Xu Liu
    Dongjin Yu
    Liming Guan
    Dongjing Wang
    Conghao Ma
    Zepeng Hu
    Applied Intelligence, 2023, 53 : 17629 - 17643
  • [2] Multi-scale spatial-temporal convolutional neural network for skeleton-based action recognition
    Cheng, Qin
    Cheng, Jun
    Ren, Ziliang
    Zhang, Qieshi
    Liu, Jianming
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1303 - 1315
  • [3] Multi-Scale Adaptive Graph Convolution Network for Skeleton-Based Action Recognition
    Hu, Huangshui
    Fang, Yue
    Han, Mei
    Qi, Xingshuo
    IEEE ACCESS, 2024, 12 : 16868 - 16880
  • [4] Multi-Scale Mixed Dense Graph Convolution Network for Skeleton-Based Action Recognition
    Xia, Hailun
    Gao, Xinkai
    IEEE ACCESS, 2021, 9 (09): : 36475 - 36484
  • [5] Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition
    Shu, Yang
    Li, Wanggen
    Li, Doudou
    Gao, Kun
    Jie, Biao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 16 - 28
  • [6] MSST-RT: Multi-Stream Spatial-Temporal Relative Transformer for Skeleton-Based Action Recognition
    Sun, Yan
    Shen, Yixin
    Ma, Liyan
    SENSORS, 2021, 21 (16)
  • [7] Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition
    Yu, Lubin
    Tian, Lianfang
    Du, Qiliang
    Bhutto, Jameel Ahmed
    APPLIED INTELLIGENCE, 2023, 53 (12) : 14838 - 14854
  • [8] Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition
    Lubin Yu
    Lianfang Tian
    Qiliang Du
    Jameel Ahmed Bhutto
    Applied Intelligence, 2023, 53 : 14838 - 14854
  • [9] Skeleton-based multi-stream adaptive-attentional sub-graph convolution network for action recognition
    Liu, Huan
    Wu, Jian
    Ma, Haokai
    Yan, Yuqi
    He, Rui
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (1) : 2935 - 2958
  • [10] Skeleton-based multi-stream adaptive-attentional sub-graph convolution network for action recognition
    Huan Liu
    Jian Wu
    Haokai Ma
    Yuqi Yan
    Rui He
    Multimedia Tools and Applications, 2024, 83 : 2935 - 2958