3D-CNN-Based Fused Feature Maps with LSTM Applied to Action Recognition

被引:41
作者
Arif, Sheeraz [1 ]
Wang, Jing [1 ]
Ul Hassan, Tehseen [1 ]
Fei, Zesong [1 ]
机构
[1] Beijing Inst Technol, Informat & Commun Engn, Beijing 100081, Peoples R China
关键词
action recognition; fused features; 3D convolution neural network; motion map; long short-term-memory; DENSE;
D O I
10.3390/fi11020042
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human activity recognition is an active field of research in computer vision with numerous applications. Recently, deep convolutional networks and recurrent neural networks (RNN) have received increasing attention in multimedia studies, and have yielded state-of-the-art results. In this research work, we propose a new framework which intelligently combines 3D-CNN and LSTM networks. First, we integrate discriminative information from a video into a map called a 'motion map' by using a deep 3-dimensional convolutional network (C3D). A motion map and the next video frame can be integrated into a new motion map, and this technique can be trained by increasing the training video length iteratively; then, the final acquired network can be used for generating the motion map of the whole video. Next, a linear weighted fusion scheme is used to fuse the network feature maps into spatio-temporal features. Finally, we use a Long-Short-Term-Memory (LSTM) encoder-decoder for final predictions. This method is simple to implement and retains discriminative and dynamic information. The improved results on benchmark public datasets prove the effectiveness and practicability of the proposed method.
引用
收藏
页数:17
相关论文
共 49 条
[1]  
[Anonymous], 2016, ARXIV160800859
[2]  
BILEN H, 2016, PROC CVPR IEEE, P3034, DOI DOI 10.1109/CVPR.2016.331
[3]   Real-time human action recognition based on depth motion maps [J].
Chen, Chen ;
Liu, Kui ;
Kehtarnavaz, Nasser .
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2016, 12 (01) :155-163
[4]  
Dollar P., 2005, VISUAL SURVEILLANCE, V14, P65, DOI DOI 10.1109/VSPETS.2005.1570899
[5]   Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [J].
Donahue, Jeff ;
Hendricks, Lisa Anne ;
Rohrbach, Marcus ;
Venugopalan, Subhashini ;
Guadarrama, Sergio ;
Saenko, Kate ;
Darrell, Trevor .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :677-691
[6]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[7]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[8]  
Fernando B, 2015, PROC CVPR IEEE, P5378, DOI 10.1109/CVPR.2015.7299176
[9]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[10]   3D Convolutional Neural Networks for Human Action Recognition [J].
Ji, Shuiwang ;
Xu, Wei ;
Yang, Ming ;
Yu, Kai .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :221-231