Multi-stream Global-Local Motion Fusion Network for skeleton-based action recognition

被引:0
作者
Qi, Yanpeng [1 ]
Pang, Chen [1 ]
Liu, Yiliang [1 ,3 ]
Lyu, Lei [1 ,2 ]
机构
[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan, Peoples R China
[2] Shandong Prov Key Lab Distributed Comp Software No, Jinan, Peoples R China
[3] Shandong Prov Acad Educ Recruitment & Examinat, Jinan, Peoples R China
关键词
Action recognition; Grouping graph convolution; Spatial-temporal self-attention; Multi-stream fusion strategy; LSTM;
D O I
10.1016/j.asoc.2023.110536
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skeleton-based action recognition is widely used in varied areas such as human-machine interaction and virtual reality. Benefit from the powerful expression ability to depict structural data, graph convolutional networks (GCNs) have been developed to address this task by modeling the human body skeletons as spatial-temporal graphs. However, most existing GCN-based methods usually ignore the diversity of the motion information between channels of the input feature. And how to enhance the ability to capture the long-term global correlations in spatial and temporal dimensions is also a fundamental challenge. In this work, we propose a novel multi-stream framework Global-Local Motion Fusion Network (GLMFN), which integrates the global and local motion information of spatial-temporal dimensions. Specifically, we design a grouping graph convolution module to enforce the ability to aggregate local spatial motion information. Besides, to learn richer semantic features, we propose two modules based on the self-attention operator: a spatial self-attention module and a temporal self-attention module. The former is responsible for extracting spatial long-term motion relationships, while the latter aims to capture temporal long-term motion relationships. Moreover, we present a multi-stream fusion strategy with a series of treatments for body joints to achieve a better recognition effect. To validate the efficacy and efficiency of the proposed model, we perform exhaustive experiments on the NTU-RGBD dataset and NTU-RGBD-120 dataset, and our method achieves the state-of-the-art performance on both datasets. (c) 2023 Published by Elsevier B.V.
引用
收藏
页数:13
相关论文
共 74 条
  • [1] Human Activity Analysis: A Review
    Aggarwal, J. K.
    Ryoo, M. S.
    [J]. ACM COMPUTING SURVEYS, 2011, 43 (03)
  • [2] Attention Augmented Convolutional Networks
    Bello, Irwan
    Zoph, Barret
    Vaswani, Ashish
    Shlens, Jonathon
    Le, Quoc V.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3285 - 3294
  • [3] Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
  • [4] Large-Scale Machine Learning with Stochastic Gradient Descent
    Bottou, Leon
    [J]. COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
  • [5] Bruna J, 2014, Arxiv, DOI arXiv:1312.6203
  • [6] Skeleton-Based Action Recognition With Gated Convolutional Neural Networks
    Cao, Congqi
    Lan, Cuiling
    Zhang, Yifan
    Zeng, Wenjun
    Lu, Hanqing
    Zhang, Yanning
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) : 3247 - 3257
  • [7] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
    Cao, Zhe
    Simon, Tomas
    Wei, Shih-En
    Sheikh, Yaser
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1302 - 1310
  • [8] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [9] Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition
    Chen, Tailin
    Zhou, Desen
    Wang, Jian
    Wang, Shidong
    Guan, Yu
    He, Xuming
    Ding, Errui
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4334 - 4342
  • [10] Skeleton-Based Action Recognition with Shift Graph Convolutional Network
    Cheng, Ke
    Zhang, Yifan
    He, Xiangyu
    Chen, Weihan
    Cheng, Jian
    Lu, Hanqing
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 180 - 189