SCNN: SEQUENTIAL CONVOLUTIONAL NEURAL NETWORK FOR HUMAN ACTION RECOGNITION IN VIDEOS

被引:0
作者
Yang, Hao
Yuan, Chunfeng [1 ]
Xing, Junliang
Hu, Weiming
机构
[1] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
来源
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) | 2017年
关键词
SCNN; Recurrent Neural Networks; Convolutional Neural Networks; Action Recognition;
D O I
暂无
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are two typical kinds of neural networks. While CNN models have achieved great success on image recognition due to their strong abilities in abstracting spatial information from multiple levels, RNN models have not achieved significant progress in video analyzing tasks (e.g. action recognition), although RNN can inherently model temporal dependencies from videos. In this work, we propose a Sequential Convolutional Neural Network, denoted as SCNN, to extract effective spatial-temporal features from videos, thus incorporating the strengths of both convolutional operation and recurrent operation. Our SCNN model extends RNN to directly process feature maps, rather than vectors flattened from feature maps, to keep spatial structures of the inputs. It replaces the full connections of RNN with convolutional connections to decrease parameter numbers, computational cost, and over-fitting risk. Moreover, we introduce asymmetric convolutional layers into SCNN to reduce parameter numbers and computational cost further. Our final SCNN deep architecture used for action recognition achieves very good performances on two challenging benchmarks, UCF-101 and HMDB-51, outperforming many state-of-the-art methods.
引用
收藏
页码:355 / 359
页数:5
相关论文
共 35 条
[1]  
[Anonymous], 2015, Delving deeper into convolutional networks for learning video representations
[2]  
[Anonymous], 2015, CVPR
[3]  
[Anonymous], 2015, CORR
[4]  
[Anonymous], 2016, ARXIV160205875
[5]  
Baccouche Moez, 2011, Human Behavior Unterstanding. Proceedings Second International Workshop, HBU 2011, P29, DOI 10.1007/978-3-642-25446-8_4
[6]  
Baccouche Moez, 2010, ANN
[7]   The devil is in the details: an evaluation of recent feature encoding methods [J].
Chatfield, Ken ;
Lempitsky, Victor ;
Vedaldi, Andrea ;
Zisserman, Andrew .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
[8]  
Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[9]   FINDING STRUCTURE IN TIME [J].
ELMAN, JL .
COGNITIVE SCIENCE, 1990, 14 (02) :179-211
[10]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941