STM: SpatioTemporal and Motion Encoding for Action Recognition

被引:337
|
作者
Jiang, Boyuan [1 ,3 ]
Wang, MengMeng [2 ]
Gan, Weihao [2 ]
Wu, Wei [2 ]
Yan, Junjie [2 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] SenseTime Grp Ltd, Hong Kong, Peoples R China
[3] SenseTime, Hong Kong, Peoples R China
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
关键词
D O I
10.1109/ICCV.2019.00209
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion features. In this work, we aim to efficiently encode these two features in a unified 2D framework. To this end, we first propose an STM block, which contains a Channel-wise SpatioTemporal Module (CSTM) to present the spatiotemporal features and a Channel-wise Motion Module (CMM) to efficiently encode motion features. We then replace original residual blocks in the ResNet architecture with STM blcoks to form a simple yet effective STM network by introducing very limited extra computation cost. Extensive experiments demonstrate that the proposed STM network outperforms the state-of-the-art methods on both temporal-related datasets (i.e., Something-Something v1 & v2 and Jester) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51) with the help of encoding spatiotemporal and motion features together.
引用
收藏
页码:2000 / 2009
页数:10
相关论文
共 50 条
  • [21] Nesting spatiotemporal attention networks for action recognition
    Li, Jiapeng
    Wei, Ping
    Zheng, Nanning
    NEUROCOMPUTING, 2021, 459 : 338 - 348
  • [22] Spatiotemporal wavelet correlogram for human action recognition
    Moghaddam, Hamid Abrishami
    Zare, Amin
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2019, 8 (03) : 167 - 180
  • [23] Fast spatiotemporal MACH filter for action recognition
    Javed Ahmed
    Sadaf Abbasi
    M. Zakir Shaikh
    Machine Vision and Applications, 2013, 24 : 909 - 918
  • [24] Separable ConvNet Spatiotemporal Mixer for Action Recognition
    Cheng, Hsu-Yung
    Yu, Chih-Chang
    Li, Chenyu
    ELECTRONICS, 2024, 13 (03)
  • [25] STAR: Efficient SpatioTemporal Modeling for Action Recognition
    Kumar, Abhijeet
    Abrams, Samuel
    Kumar, Abhishek
    Narayanan, Vijaykrishnan
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (02) : 705 - 723
  • [26] Fast spatiotemporal MACH filter for action recognition
    Ahmed, Javed
    Abbasi, Sadaf
    Shaikh, M. Zakir
    MACHINE VISION AND APPLICATIONS, 2013, 24 (05) : 909 - 918
  • [27] Constructing Hierarchical Spatiotemporal Information for Action Recognition
    Yao, Guangle
    Zhong, Jiandan
    Lei, Tao
    Liu, Xianyuan
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 596 - 602
  • [28] Spatiotemporal Fusion Networks for Video Action Recognition
    Zheng Liu
    Haifeng Hu
    Junxuan Zhang
    Neural Processing Letters, 2019, 50 : 1877 - 1890
  • [29] Spatiotemporal feature enhancement network for action recognition
    Huang, Guancheng
    Wang, Xiuhui
    Li, Xuesheng
    Wang, Yaru
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57187 - 57197
  • [30] A Closer Look at Spatiotemporal Convolutions for Action Recognition
    Tran, Du
    Wang, Heng
    Torresani, Lorenzo
    Ray, Jamie
    LeCun, Yann
    Paluri, Manohar
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6450 - 6459