Motion Feature Network: Fixed Motion Filter for Action Recognition

被引:89
作者
Lee, Myunggi [1 ,2 ]
Lee, Seungeui [1 ]
Son, Sungjoon [1 ,2 ]
Park, Gyutae [1 ,2 ]
Kwak, Nojun [1 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
[2] VDO Inc, Suwon, South Korea
来源
COMPUTER VISION - ECCV 2018, PT X | 2018年 / 11214卷
关键词
Action recognition; Motion filter; MFNet; Spatio-temporal representation;
D O I
10.1007/978-3-030-01249-6_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatio-temporal representations in frame sequences play an important role in the task of action recognition. Previously, a method of using optical flow as a temporal information in combination with a set of RGB images that contain spatial information has shown great performance enhancement in the action recognition tasks. However, it has an expensive computational cost and requires two-stream (RGB and optical flow) framework. In this paper, we propose MFNet (Motion Feature Network) containing motion blocks which make it possible to encode spatio-temporal information between adjacent frames in a unified network that can be trained end-to-end. The motion block can be attached to any existing CNN-based action recognition frameworks with only a small additional cost. We evaluated our network on two of the action recognition datasets (Jester and Something-Something) and achieved competitive performances for both datasets by training the networks from scratch.
引用
收藏
页码:392 / 408
页数:17
相关论文
共 37 条
[1]  
[Anonymous], 2017, ABS170805038 CORR
[2]  
[Anonymous], 2017, arXiv
[3]  
[Anonymous], 2016, CONVOLUTIONAL 2 STRE
[4]  
[Anonymous], HDB BRAIN THEORY NEU
[5]  
[Anonymous], 2015, FUSING MULTISTREAM D
[6]  
[Anonymous], 2017, ARXIV PREPRINT ARXIV
[7]  
[Anonymous], 2012, CoRR
[8]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[9]  
Chen M, 2011, INT CONF CLOUD COMPU, P316, DOI 10.1109/CCIS.2011.6045082
[10]   Human detection using oriented histograms of flow and appearance [J].
Dalal, Navneet ;
Triggs, Bill ;
Schmid, Cordelia .
COMPUTER VISION - ECCV 2006, PT 2, PROCEEDINGS, 2006, 3952 :428-441