Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network

被引:7
|
作者
Zhang, Haiping [1 ,2 ]
Liu, Xu [3 ]
Yu, Dongjin [1 ]
Guan, Liming [2 ]
Wang, Dongjing [1 ]
Ma, Conghao [3 ]
Hu, Zepeng [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Zhejiang, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Elect & Informat, Hangzhou 310018, Zhejiang, Peoples R China
关键词
Action recognition; Skeleton; GCN; Multi-stream network; VISUAL SURVEILLANCE;
D O I
10.1007/s10489-022-04365-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition techniques based on skeleton data are receiving more and more attention in the field of computer vision due to their ability to adapt to dynamic environments and complex backgrounds. Topologizing human skeleton data as spatial-temporal graphs and processing them using graph convolutional networks (GCNs) has been shown to produce good recognition results. However, with existing GCN methods, a fixed-size convolution kernel is often used to extract time-domain features, which may not be very suitable for multi-level model structures. Equal proportion fusion of different streams in a multi-stream network may ignore the difference in recognition ability of different streams, and these will affect the final recognition result. In this paper, we are proposing (1) a multi-scale dilated temporal graph convolution layer (MDTGCL) and (2) a multi-branch feature fusion (MFF) structure. The MDTGCL utilizes multiple convolution kernels and dilated convolution to better adapt to the multi-layer structure of the GCN model and to obtain longer periods of contextual spatial-temporal information, resulting in richer behavioural features. MFF entails weighted fusion based on the results of multi-stream outputs, and this is used to obtain the final recognition results. As higher-order skeleton data are highly discriminative and more conducive to human action recognition, we used spatial information on joints and bones and their multiple motion, as well as angle information pertaining to bones, to model together in this study. By combining the above, we designed a multi-stream, multi-scale dilated spatial-temporal graph convolutional network (2M-STGCN) model and conducted extensive experiments with two large datasets (NTU RGB+D 60 and Kinetics Skeleton 400), which showed that our model performs at SOTA level.
引用
收藏
页码:17629 / 17643
页数:15
相关论文
共 50 条
  • [41] Focal and Global Spatial-Temporal Transformer for Skeleton-Based Action Recognition
    Gao, Zhimin
    Wang, Peitao
    Lv, Pei
    Jiang, Xiaoheng
    Liu, Qidong
    Wang, Pichao
    Xu, Mingliang
    Li, Wanqing
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 155 - 171
  • [42] Spatial Temporal Graph Deconvolutional Network for Skeleton-Based Human Action Recognition
    Peng, Wei
    Shi, Jingang
    Zhao, Guoying
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 244 - 248
  • [43] STST: Spatial-Temporal Specialized Transformer for Skeleton-based Action Recognition
    Zhang, Yuhan
    Wu, Bo
    Li, Wen
    Duan, Lixin
    Gan, Chuang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3229 - 3237
  • [44] A Spatial-Temporal Feature Fusion Strategy for Skeleton-Based Action Recognition
    Chen, Yitian
    Xu, Yuchen
    Xie, Qianglai
    Xiong, Lei
    Yao, Leiyue
    2023 INTERNATIONAL CONFERENCE ON DATA SECURITY AND PRIVACY PROTECTION, DSPP, 2023, : 207 - 215
  • [45] One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching
    Yang, Siyuan
    Liu, Jun
    Lu, Shijian
    Hwa, Er Meng
    Kot, Alex C.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 5149 - 5156
  • [46] Action recognition method based on multi-stream attention-enhanced recursive graph convolution
    Wang, Huaijun
    Bai, Bingqian
    Li, Junhuai
    Ke, Hui
    Xiang, Wei
    APPLIED INTELLIGENCE, 2024, 54 (20) : 10133 - 10147
  • [47] A Spatial-Temporal Multi-Feature Network (STMF-Net) for Skeleton-Based Construction Worker Action Recognition
    Tian, Yuanyuan
    Lin, Sen
    Xu, Hejun
    Chen, Guangchong
    Sensors, 2024, 24 (23)
  • [48] Adaptive spatiotemporal graph convolutional network with intermediate aggregation of multi-stream skeleton features for action recognition
    Zhao, Yukai
    Wang, Jingwei
    Wang, Han
    Liu, Min
    Ma, Yunlong
    NEUROCOMPUTING, 2022, 505 : 116 - 124
  • [49] Enhanced decoupling graph convolution network for skeleton-based action recognition
    Gu, Yue
    Yu, Qiang
    Xue, Wanli
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (29) : 73289 - 73304
  • [50] Kernel Attention Based Multi-scale Adaptive Graph Convolutional Neural Network for Skeleton-Based
    Liu, Yanan
    Zhang, Hao
    Xu, Dan
    2021 IEEE 7TH INTERNATIONAL CONFERENCE ON VIRTUAL REALITY (ICVR 2021), 2021, : 96 - 103