Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network

被引:7
|
作者
Zhang, Haiping [1 ,2 ]
Liu, Xu [3 ]
Yu, Dongjin [1 ]
Guan, Liming [2 ]
Wang, Dongjing [1 ]
Ma, Conghao [3 ]
Hu, Zepeng [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Zhejiang, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Elect & Informat, Hangzhou 310018, Zhejiang, Peoples R China
关键词
Action recognition; Skeleton; GCN; Multi-stream network; VISUAL SURVEILLANCE;
D O I
10.1007/s10489-022-04365-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition techniques based on skeleton data are receiving more and more attention in the field of computer vision due to their ability to adapt to dynamic environments and complex backgrounds. Topologizing human skeleton data as spatial-temporal graphs and processing them using graph convolutional networks (GCNs) has been shown to produce good recognition results. However, with existing GCN methods, a fixed-size convolution kernel is often used to extract time-domain features, which may not be very suitable for multi-level model structures. Equal proportion fusion of different streams in a multi-stream network may ignore the difference in recognition ability of different streams, and these will affect the final recognition result. In this paper, we are proposing (1) a multi-scale dilated temporal graph convolution layer (MDTGCL) and (2) a multi-branch feature fusion (MFF) structure. The MDTGCL utilizes multiple convolution kernels and dilated convolution to better adapt to the multi-layer structure of the GCN model and to obtain longer periods of contextual spatial-temporal information, resulting in richer behavioural features. MFF entails weighted fusion based on the results of multi-stream outputs, and this is used to obtain the final recognition results. As higher-order skeleton data are highly discriminative and more conducive to human action recognition, we used spatial information on joints and bones and their multiple motion, as well as angle information pertaining to bones, to model together in this study. By combining the above, we designed a multi-stream, multi-scale dilated spatial-temporal graph convolutional network (2M-STGCN) model and conducted extensive experiments with two large datasets (NTU RGB+D 60 and Kinetics Skeleton 400), which showed that our model performs at SOTA level.
引用
收藏
页码:17629 / 17643
页数:15
相关论文
共 50 条
  • [41] Multilevel Spatial-Temporal Excited Graph Network for Skeleton-Based Action Recognition
    Zhu, Yisheng
    Shuai, Hui
    Liu, Guangcan
    Liu, Qingshan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 496 - 508
  • [42] Hierarchical adaptive multi-scale hypergraph attention convolution network for skeleton-based action recognition
    Yang, Honghong
    Wang, Sai
    Jiang, Lu
    Su, Yuping
    Zhang, Yumei
    APPLIED SOFT COMPUTING, 2025, 172
  • [43] MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition
    Kong, Jun
    Bian, Yuhang
    Jiang, Min
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 528 - 532
  • [44] Lighter and faster: A multi-scale adaptive graph convolutional network for skeleton-based action recognition
    Jiang, Yuanjian
    Deng, Hongmin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132
  • [45] A Novel Spatial-Temporal Graph for Skeleton-based Driver Action Recognition
    Li, Peng
    Lu, Meiqi
    Zhang, Zhiwei
    Shan, Donghui
    Yang, Yang
    2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 3243 - 3248
  • [46] Spatial-temporal graph attention networks for skeleton-based action recognition
    Huang, Qingqing
    Zhou, Fengyu
    He, Jiakai
    Zhao, Yang
    Qin, Runze
    JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (05)
  • [47] Pyramid Spatial-Temporal Graph Transformer for Skeleton-Based Action Recognition
    Chen, Shuo
    Xu, Ke
    Jiang, Xinghao
    Sun, Tanfeng
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [48] Spatial-temporal graph transformer network for skeleton-based temporal action segmentation
    Xiaoyan Tian
    Ye Jin
    Zhao Zhang
    Peng Liu
    Xianglong Tang
    Multimedia Tools and Applications, 2024, 83 : 44273 - 44297
  • [49] Spatial-temporal graph transformer network for skeleton-based temporal action segmentation
    Tian, Xiaoyan
    Jin, Ye
    Zhang, Zhao
    Liu, Peng
    Tang, Xianglong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44273 - 44297
  • [50] Multi-stream Global-Local Motion Fusion Network for skeleton-based action recognition
    Qi, Yanpeng
    Pang, Chen
    Liu, Yiliang
    Lyu, Lei
    APPLIED SOFT COMPUTING, 2023, 145