STM: SpatioTemporal and Motion Encoding for Action Recognition

被引：337

作者：

Jiang, Boyuan ^{[1
,3
]}

Wang, MengMeng ^{[2
]}

Gan, Weihao ^{[2
]}

Wu, Wei ^{[2
]}

Yan, Junjie ^{[2
]}

机构：

[1] Zhejiang Univ, Hangzhou, Peoples R China

[2] SenseTime Grp Ltd, Hong Kong, Peoples R China

[3] SenseTime, Hong Kong, Peoples R China

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

关键词：

D O I：

10.1109/ICCV.2019.00209

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion features. In this work, we aim to efficiently encode these two features in a unified 2D framework. To this end, we first propose an STM block, which contains a Channel-wise SpatioTemporal Module (CSTM) to present the spatiotemporal features and a Channel-wise Motion Module (CMM) to efficiently encode motion features. We then replace original residual blocks in the ResNet architecture with STM blcoks to form a simple yet effective STM network by introducing very limited extra computation cost. Extensive experiments demonstrate that the proposed STM network outperforms the state-of-the-art methods on both temporal-related datasets (i.e., Something-Something v1 & v2 and Jester) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51) with the help of encoding spatiotemporal and motion features together.

引用

页码：2000 / 2009

页数：10

共 50 条

[21] Nesting spatiotemporal attention networks for action recognition
Li, Jiapeng
Wei, Ping
Zheng, Nanning
NEUROCOMPUTING, 2021, 459 : 338 - 348
[22] Spatiotemporal wavelet correlogram for human action recognition
Moghaddam, Hamid Abrishami
Zare, Amin
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2019, 8 (03) : 167 - 180
[23] Fast spatiotemporal MACH filter for action recognition
Javed Ahmed
Sadaf Abbasi
M. Zakir Shaikh
Machine Vision and Applications, 2013, 24 : 909 - 918
[24] Separable ConvNet Spatiotemporal Mixer for Action Recognition
Cheng, Hsu-Yung
Yu, Chih-Chang
Li, Chenyu
ELECTRONICS, 2024, 13 (03)
[25] STAR: Efficient SpatioTemporal Modeling for Action Recognition
Kumar, Abhijeet
Abrams, Samuel
Kumar, Abhishek
Narayanan, Vijaykrishnan
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (02) : 705 - 723
[26] Fast spatiotemporal MACH filter for action recognition
Ahmed, Javed
Abbasi, Sadaf
Shaikh, M. Zakir
MACHINE VISION AND APPLICATIONS, 2013, 24 (05) : 909 - 918
[27] Constructing Hierarchical Spatiotemporal Information for Action Recognition
Yao, Guangle
Zhong, Jiandan
Lei, Tao
Liu, Xianyuan
2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 596 - 602
[28] Spatiotemporal Fusion Networks for Video Action Recognition
Zheng Liu
Haifeng Hu
Junxuan Zhang
Neural Processing Letters, 2019, 50 : 1877 - 1890
[29] Spatiotemporal feature enhancement network for action recognition
Huang, Guancheng
Wang, Xiuhui
Li, Xuesheng
Wang, Yaru
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57187 - 57197
[30] A Closer Look at Spatiotemporal Convolutions for Action Recognition
Tran, Du
Wang, Heng
Torresani, Lorenzo
Ray, Jamie
LeCun, Yann
Paluri, Manohar
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6450 - 6459

← 1 2 3 4 5 →