A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention

被引:10
|
作者
Yang, Qi [1 ,2 ]
Lu, Tongwei [1 ,2 ]
Zhou, Huabing [1 ,2 ]
机构
[1] Wuhan Inst Technol, Sch Comp Sci & Engn, Wuhan 430205, Peoples R China
[2] Wuhan Inst Technol, Hubei Key Lab Intelligent Robot, Wuhan 430205, Peoples R China
基金
中国国家自然科学基金;
关键词
temporal modeling; spatio-temporal motion; group convolution; spatial attention;
D O I
10.3390/e24030368
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Temporal modeling is the key for action recognition in videos, but traditional 2D CNNs do not capture temporal relationships well. 3D CNNs can achieve good performance, but are computationally intensive and not well practiced on existing devices. Based on these problems, we design a generic and effective module called spatio-temporal motion network (SMNet). SMNet maintains the complexity of 2D and reduces the computational effort of the algorithm while achieving performance comparable to 3D CNNs. SMNet contains a spatio-temporal excitation module (SE) and a motion excitation module (ME). The SE module uses group convolution to fuse temporal information to reduce the number of parameters in the network, and uses spatial attention to extract spatial information. The ME module uses the difference between adjacent frames to extract feature-level motion patterns between adjacent frames, which can effectively encode motion features and help identify actions efficiently. We use ResNet-50 as the backbone network and insert SMNet into the residual blocks to form a simple and effective action network. The experiment results on three datasets, namely Something-Something V1, Something-Something V2, and Kinetics-400, show that it out performs state-of-the-arts motion recognition networks.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Global Temporal Difference Network for Action Recognition
    Xie, Zhao
    Chen, Jiansong
    Wu, Kewei
    Guo, Dan
    Hong, Richang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7594 - 7606
  • [22] Residual attention fusion network for video action recognition
    Li, Ao
    Yi, Yang
    Liang, Daan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [23] Interactive facial expression editing based on spatio-temporal coherency
    Chi, Jing
    Gao, Shanshan
    Zhang, Caiming
    VISUAL COMPUTER, 2017, 33 (6-8) : 981 - 991
  • [24] Weakly supervised spatial-temporal attention network driven by tracking and consistency loss for action detection
    Zhu, Jinlei
    Chen, Houjin
    Pan, Pan
    Sun, Jia
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2022, 2022 (01)
  • [25] Interactive facial expression editing based on spatio-temporal coherency
    Jing Chi
    Shanshan Gao
    Caiming Zhang
    The Visual Computer, 2017, 33 : 981 - 991
  • [26] Multipath Attention and Adaptive Gating Network for Video Action Recognition
    Zhang, Haiping
    Hu, Zepeng
    Yu, Dongjin
    Guan, Liming
    Liu, Xu
    Ma, Conghao
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [27] Multipath Attention and Adaptive Gating Network for Video Action Recognition
    Haiping Zhang
    Zepeng Hu
    Dongjin Yu
    Liming Guan
    Xu Liu
    Conghao Ma
    Neural Processing Letters, 56
  • [28] Frequency-driven channel attention-augmented full-scale temporal modeling network for skeleton-based action recognition
    Li, Fanjia
    Zhu, Aichun
    Li, Juanjuan
    Xu, Yonggang
    Zhang, Yandong
    Yin, Hongsheng
    Hua, Gang
    KNOWLEDGE-BASED SYSTEMS, 2022, 256
  • [29] MST-DGCN: A Multi-Scale Spatio-Temporal and Dynamic Graph Convolution Fusion Network for Electroencephalogram Recognition of Motor Imagery
    Chen, Yuanling
    Liu, Peisen
    Li, Duan
    ELECTRONICS, 2024, 13 (11)
  • [30] Graph convolutional network with STC attention and adaptive normalization for skeleton-based action recognition
    Zhou, Haiyun
    Xiang, Xuezhi
    Qiu, Yujian
    Liu, Xuzhao
    IMAGING SCIENCE JOURNAL, 2023, 71 (07) : 636 - 646