STM: SpatioTemporal and Motion Encoding for Action Recognition

被引:338
作者
Jiang, Boyuan [1 ,3 ]
Wang, MengMeng [2 ]
Gan, Weihao [2 ]
Wu, Wei [2 ]
Yan, Junjie [2 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] SenseTime Grp Ltd, Hong Kong, Peoples R China
[3] SenseTime, Hong Kong, Peoples R China
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
关键词
D O I
10.1109/ICCV.2019.00209
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion features. In this work, we aim to efficiently encode these two features in a unified 2D framework. To this end, we first propose an STM block, which contains a Channel-wise SpatioTemporal Module (CSTM) to present the spatiotemporal features and a Channel-wise Motion Module (CMM) to efficiently encode motion features. We then replace original residual blocks in the ResNet architecture with STM blcoks to form a simple yet effective STM network by introducing very limited extra computation cost. Extensive experiments demonstrate that the proposed STM network outperforms the state-of-the-art methods on both temporal-related datasets (i.e., Something-Something v1 & v2 and Jester) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51) with the help of encoding spatiotemporal and motion features together.
引用
收藏
页码:2000 / 2009
页数:10
相关论文
共 50 条
  • [41] Attention, biological motion, and action recognition
    Thompson, James
    Parasuraman, Raja
    NEUROIMAGE, 2012, 59 (01) : 4 - 13
  • [42] Motion Stimulation for Compositional Action Recognition
    Ma, Lei
    Zheng, Yuhui
    Zhang, Zhao
    Yao, Yazhou
    Fan, Xijian
    Ye, Qiaolin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2061 - 2074
  • [43] Hierarchical Motion Evolution for Action Recognition
    Wang, Hongsong
    Wang, Wei
    Wang, Liang
    PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 574 - 578
  • [44] Action Recognition of Motion Capture Data
    Lv, Na
    Feng, Zhiquan
    Ran, Lingqiang
    Zhao, Xiuyang
    2014 7TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP 2014), 2014, : 22 - 26
  • [45] Human motion analysis and action recognition
    Karahoca, Adem
    Nurullahoglu, Murat
    NEW ASPECTS OF MICROELECTRONICS, NANOELECTRONICS, OPTOELECTRONICS, 2008, : 156 - 161
  • [46] Spatiotemporal Features for Action Recognition and Salient Event Detection
    Rapantzikos, Konstantinos
    Avrithis, Yannis
    Kollias, Stefanos
    COGNITIVE COMPUTATION, 2011, 3 (01) : 167 - 184
  • [47] SpatioTemporal focus for skeleton-based action recognition
    Wu, Liyu
    Zhang, Can
    Zou, Yuexian
    PATTERN RECOGNITION, 2023, 136
  • [48] Embedding Sequential Information into Spatiotemporal Features for Action Recognition
    Ye, Yuancheng
    Tian, Yingli
    PROCEEDINGS OF 29TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, (CVPRW 2016), 2016, : 1110 - 1118
  • [49] Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis
    Derpanis, Konstantinos G.
    Sizintsev, Mikhail
    Cannons, Kevin J.
    Wildes, Richard P.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (03) : 527 - 540
  • [50] Spatiotemporal Saliency Representation Learning for Video Action Recognition
    Kong, Yongqiang
    Wang, Yunhong
    Li, Annan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1515 - 1528