STM: SpatioTemporal and Motion Encoding for Action Recognition

被引：338

作者：

Jiang, Boyuan ^{[1
,3
]}

Wang, MengMeng ^{[2
]}

Gan, Weihao ^{[2
]}

Wu, Wei ^{[2
]}

Yan, Junjie ^{[2
]}

机构：

[1] Zhejiang Univ, Hangzhou, Peoples R China

[2] SenseTime Grp Ltd, Hong Kong, Peoples R China

[3] SenseTime, Hong Kong, Peoples R China

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

关键词：

D O I：

10.1109/ICCV.2019.00209

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion features. In this work, we aim to efficiently encode these two features in a unified 2D framework. To this end, we first propose an STM block, which contains a Channel-wise SpatioTemporal Module (CSTM) to present the spatiotemporal features and a Channel-wise Motion Module (CMM) to efficiently encode motion features. We then replace original residual blocks in the ResNet architecture with STM blcoks to form a simple yet effective STM network by introducing very limited extra computation cost. Extensive experiments demonstrate that the proposed STM network outperforms the state-of-the-art methods on both temporal-related datasets (i.e., Something-Something v1 & v2 and Jester) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51) with the help of encoding spatiotemporal and motion features together.

引用

页码：2000 / 2009

页数：10

共 50 条

[41] Attention, biological motion, and action recognition
Thompson, James
Parasuraman, Raja
NEUROIMAGE, 2012, 59 (01) : 4 - 13
[42] Motion Stimulation for Compositional Action Recognition
Ma, Lei
Zheng, Yuhui
Zhang, Zhao
Yao, Yazhou
Fan, Xijian
Ye, Qiaolin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2061 - 2074
[43] Hierarchical Motion Evolution for Action Recognition
Wang, Hongsong
Wang, Wei
Wang, Liang
PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 574 - 578
[44] Action Recognition of Motion Capture Data
Lv, Na
Feng, Zhiquan
Ran, Lingqiang
Zhao, Xiuyang
2014 7TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP 2014), 2014, : 22 - 26
[45] Human motion analysis and action recognition
Karahoca, Adem
Nurullahoglu, Murat
NEW ASPECTS OF MICROELECTRONICS, NANOELECTRONICS, OPTOELECTRONICS, 2008, : 156 - 161
[46] Spatiotemporal Features for Action Recognition and Salient Event Detection
Rapantzikos, Konstantinos
Avrithis, Yannis
Kollias, Stefanos
COGNITIVE COMPUTATION, 2011, 3 (01) : 167 - 184
[47] SpatioTemporal focus for skeleton-based action recognition
Wu, Liyu
Zhang, Can
Zou, Yuexian
PATTERN RECOGNITION, 2023, 136
[48] Embedding Sequential Information into Spatiotemporal Features for Action Recognition
Ye, Yuancheng
Tian, Yingli
PROCEEDINGS OF 29TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, (CVPRW 2016), 2016, : 1110 - 1118
[49] Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis
Derpanis, Konstantinos G.
Sizintsev, Mikhail
Cannons, Kevin J.
Wildes, Richard P.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (03) : 527 - 540
[50] Spatiotemporal Saliency Representation Learning for Video Action Recognition
Kong, Yongqiang
Wang, Yunhong
Li, Annan
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1515 - 1528

← 1 2 3 4 5 →