A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention

被引:10
|
作者
Yang, Qi [1 ,2 ]
Lu, Tongwei [1 ,2 ]
Zhou, Huabing [1 ,2 ]
机构
[1] Wuhan Inst Technol, Sch Comp Sci & Engn, Wuhan 430205, Peoples R China
[2] Wuhan Inst Technol, Hubei Key Lab Intelligent Robot, Wuhan 430205, Peoples R China
基金
中国国家自然科学基金;
关键词
temporal modeling; spatio-temporal motion; group convolution; spatial attention;
D O I
10.3390/e24030368
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Temporal modeling is the key for action recognition in videos, but traditional 2D CNNs do not capture temporal relationships well. 3D CNNs can achieve good performance, but are computationally intensive and not well practiced on existing devices. Based on these problems, we design a generic and effective module called spatio-temporal motion network (SMNet). SMNet maintains the complexity of 2D and reduces the computational effort of the algorithm while achieving performance comparable to 3D CNNs. SMNet contains a spatio-temporal excitation module (SE) and a motion excitation module (ME). The SE module uses group convolution to fuse temporal information to reduce the number of parameters in the network, and uses spatial attention to extract spatial information. The ME module uses the difference between adjacent frames to extract feature-level motion patterns between adjacent frames, which can effectively encode motion features and help identify actions efficiently. We use ResNet-50 as the backbone network and insert SMNet into the residual blocks to form a simple and effective action network. The experiment results on three datasets, namely Something-Something V1, Something-Something V2, and Kinetics-400, show that it out performs state-of-the-arts motion recognition networks.
引用
收藏
页数:19
相关论文
共 50 条
  • [11] Temporal and spatio-temporal attention to tactile stimuli in extinction patients
    Guerrini, C
    Aglioti, SM
    CORTEX, 2006, 42 (01) : 17 - 27
  • [12] Prediction based occluded multitarget tracking using spatio-temporal attention
    Lee, Heungkyu
    Kim, June
    Ko, Hanseok
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2006, 20 (06) : 925 - 938
  • [13] Spatio-temporal templates of transient attention revealed by classification images
    Megna, Nicola
    Rocchi, Francesca
    Baldassi, Stefano
    VISION RESEARCH, 2012, 54 : 39 - 48
  • [14] STA-CNN: Convolutional Spatial-Temporal Attention Learning for Action Recognition
    Yang, Hao
    Yuan, Chunfeng
    Zhang, Li
    Sun, Yunda
    Hu, Weiming
    Maybank, Stephen J.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5783 - 5793
  • [15] STICAP: Spatio-temporal Interactive Attention for Citywide Crowd Activity Prediction
    Huang, Huiqun
    He, Suining
    Yang, Xi
    Tabatabaie, Mahan
    ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2024, 10 (01)
  • [16] Dual attention convolutional network for action recognition
    Li, Xiaoqiang
    Xie, Miao
    Zhang, Yin
    Ding, Guangtai
    Tong, Weiqin
    IET IMAGE PROCESSING, 2020, 14 (06) : 1059 - 1065
  • [17] Spatial attention based visual semantic learning for action recognition in still images
    Zheng, Yunpeng
    Zheng, Xiangtao
    Lu, Xiaoqiang
    Wu, Siyuan
    NEUROCOMPUTING, 2020, 413 : 383 - 396
  • [18] Human action recognition method based on Motion Excitation and Temporal Aggregation module
    Ye, Qing
    Tan, Zexian
    Zhang, Yongmei
    HELIYON, 2022, 8 (11)
  • [19] Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection
    Jinlei Zhu
    Houjin Chen
    Pan Pan
    Jia Sun
    EURASIP Journal on Image and Video Processing, 2022
  • [20] Agito ergo sum: Correlates of spatio-temporal motion characteristics during fMRI
    Bolton, Thomas A. W.
    Kebets, Valeria
    Glerean, Enrico
    Zoeller, Daniela
    Li, Jingwei
    Yeo, B. T. Thomas
    Caballero-Gaudes, Cesar
    Van De Ville, Dimitri
    NEUROIMAGE, 2020, 209