STM: SpatioTemporal and Motion Encoding for Action Recognition

被引:367
作者
Jiang, Boyuan [1 ,3 ]
Wang, MengMeng [2 ]
Gan, Weihao [2 ]
Wu, Wei [2 ]
Yan, Junjie [2 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] SenseTime Grp Ltd, Hong Kong, Peoples R China
[3] SenseTime, Hong Kong, Peoples R China
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
关键词
D O I
10.1109/ICCV.2019.00209
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion features. In this work, we aim to efficiently encode these two features in a unified 2D framework. To this end, we first propose an STM block, which contains a Channel-wise SpatioTemporal Module (CSTM) to present the spatiotemporal features and a Channel-wise Motion Module (CMM) to efficiently encode motion features. We then replace original residual blocks in the ResNet architecture with STM blcoks to form a simple yet effective STM network by introducing very limited extra computation cost. Extensive experiments demonstrate that the proposed STM network outperforms the state-of-the-art methods on both temporal-related datasets (i.e., Something-Something v1 & v2 and Jester) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51) with the help of encoding spatiotemporal and motion features together.
引用
收藏
页码:2000 / 2009
页数:10
相关论文
共 41 条
[1]  
[Anonymous], 2011, P INT C COMP VIS ICC
[2]  
[Anonymous], 2018, P ECCV
[3]  
[Anonymous], 2018, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2017.2712608
[4]  
[Anonymous], 2018, EUR C COMP VIS ECCV
[5]  
[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.90
[6]  
[Anonymous], 2017, P IEEE C COMPUTER VI
[7]  
[Anonymous], 2012, CoRR
[8]  
[Anonymous], 2016, P ECCV
[9]  
Carreira J., 2017, NEW MODEL KINETICS D, P6299
[10]  
Chen M, 2011, INT CONF CLOUD COMPU, P316, DOI 10.1109/CCIS.2011.6045082