Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network

被引:7
|
作者
Zhang, Haiping [1 ,2 ]
Liu, Xu [3 ]
Yu, Dongjin [1 ]
Guan, Liming [2 ]
Wang, Dongjing [1 ]
Ma, Conghao [3 ]
Hu, Zepeng [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Zhejiang, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Elect & Informat, Hangzhou 310018, Zhejiang, Peoples R China
关键词
Action recognition; Skeleton; GCN; Multi-stream network; VISUAL SURVEILLANCE;
D O I
10.1007/s10489-022-04365-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition techniques based on skeleton data are receiving more and more attention in the field of computer vision due to their ability to adapt to dynamic environments and complex backgrounds. Topologizing human skeleton data as spatial-temporal graphs and processing them using graph convolutional networks (GCNs) has been shown to produce good recognition results. However, with existing GCN methods, a fixed-size convolution kernel is often used to extract time-domain features, which may not be very suitable for multi-level model structures. Equal proportion fusion of different streams in a multi-stream network may ignore the difference in recognition ability of different streams, and these will affect the final recognition result. In this paper, we are proposing (1) a multi-scale dilated temporal graph convolution layer (MDTGCL) and (2) a multi-branch feature fusion (MFF) structure. The MDTGCL utilizes multiple convolution kernels and dilated convolution to better adapt to the multi-layer structure of the GCN model and to obtain longer periods of contextual spatial-temporal information, resulting in richer behavioural features. MFF entails weighted fusion based on the results of multi-stream outputs, and this is used to obtain the final recognition results. As higher-order skeleton data are highly discriminative and more conducive to human action recognition, we used spatial information on joints and bones and their multiple motion, as well as angle information pertaining to bones, to model together in this study. By combining the above, we designed a multi-stream, multi-scale dilated spatial-temporal graph convolutional network (2M-STGCN) model and conducted extensive experiments with two large datasets (NTU RGB+D 60 and Kinetics Skeleton 400), which showed that our model performs at SOTA level.
引用
收藏
页码:17629 / 17643
页数:15
相关论文
共 50 条
  • [21] Multi-stream ternary enhanced graph convolutional network for skeleton-based action recognition
    Kong, Jun
    Wang, Shengquan
    Jiang, Min
    Liu, TianShan
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (25): : 18487 - 18504
  • [22] Multi-stream ternary enhanced graph convolutional network for skeleton-based action recognition
    Jun Kong
    Shengquan Wang
    Min Jiang
    TianShan Liu
    Neural Computing and Applications, 2023, 35 : 18487 - 18504
  • [23] Multi-scale spatial–temporal convolutional neural network for skeleton-based action recognition
    Qin Cheng
    Jun Cheng
    Ziliang Ren
    Qieshi Zhang
    Jianming Liu
    Pattern Analysis and Applications, 2023, 26 (3) : 1303 - 1315
  • [24] Multi-scale skeleton simplification graph convolutional network for skeleton-based action recognition
    Fan, Zhang
    Ding, Chongyang
    Kai, Liu
    Liu, Hongjin
    IET COMPUTER VISION, 2024, 18 (07) : 992 - 1003
  • [25] Multi-stream mixed graph convolutional networks for skeleton-based action recognition
    Zhuang, Boyuan
    Kong, Jun
    Jiang, Min
    Liu, Tianshan
    JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (06)
  • [26] Multi-stream slowFast graph convolutional networks for skeleton-based action recognition
    Sun, Ning
    Leng, Ling
    Liu, Jixin
    Han, Guang
    IMAGE AND VISION COMPUTING, 2021, 109
  • [27] Skeleton-Based Action Recognition With Multi-Stream Adaptive Graph Convolutional Networks
    Shi, Lei
    Zhang, Yifan
    Cheng, Jian
    Lu, Hanqing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 9532 - 9545
  • [28] Multi-Scale Structural Graph Convolutional Network for Skeleton-Based Action Recognition
    Jang, Sungjun
    Lee, Heansung
    Kim, Woo Jin
    Lee, Jungho
    Woo, Sungmin
    Lee, Sangyoun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7244 - 7258
  • [29] Temporal channel reconfiguration multi-graph convolution network for skeleton-based action recognition
    Lei, Siyue
    Tang, Bin
    Chen, Yanhua
    Zhao, Mingfu
    Xu, Yifei
    Long, Zourong
    IET COMPUTER VISION, 2024, 18 (06) : 813 - 825
  • [30] Multi-Stream Fusion Network for Skeleton-Based Construction Worker Action Recognition
    Tian, Yuanyuan
    Liang, Yan
    Yang, Haibin
    Chen, Jiayu
    SENSORS, 2023, 23 (23)