Short-Term Action Recognition by 3D Convolutional Neural Network with Pixel-Wise Evidences

被引:0
作者
Wang, XiaoHan [1 ]
Miyao, Junichi [1 ]
Kurita, Takio [1 ]
机构
[1] Hiroshima Univ, Dept Informat Engn, Hiroshima, Japan
来源
FRONTIERS OF COMPUTER VISION | 2020年 / 1212卷
关键词
Action recognition; Autoencoder; 3D convolution;
D O I
10.1007/978-981-15-4818-5_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition in videos is becoming popular these years. The difficulty is how to extract the temporal information, which is important in the target actions. In this paper, we propose a conceptually, simple network for short-term action recognition. The proposed network architecture is extended from standard neural network to Autoencoder, which estimates pixel-wise evidence in frames, and they are integrated to classify the actions in the simple classifier. In the proposed architecture, the standard 2D convolutional layers for image classification are extended to 3D convolutional layers in the Autoencoder to extract the temporal information in the target actions. In the training phase, classifiers are introduced in the middle of layer to let the features of the middle layers are well discriminated. Also, classifiers are introduced at the end of layer to improve performance of the standard classifier. We have performed experiments using UCF101 dataset to evaluate the effectiveness of the proposed architecture. The results show that our methods can get efficient performance in short-term action recognition.
引用
收藏
页码:69 / 82
页数:14
相关论文
共 18 条
[1]  
[Anonymous], 1986, FDN PARALLEL DISTRIB
[2]  
[Anonymous], 2012, UCF101 DATASET 101 H
[3]  
[Anonymous], 2014, ARXIV
[4]   Multi-View Super Vector for Action Recognition [J].
Cai, Zhuowei ;
Wang, Limin ;
Peng, Xiaojiang ;
Qiao, Yu .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :596-603
[5]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[6]  
Feichtenhofer C, 2016, ADV NEUR IN, V29
[7]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[8]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507
[9]   3D Convolutional Neural Networks for Human Action Recognition [J].
Ji, Shuiwang ;
Xu, Wei ;
Yang, Ming ;
Yu, Kai .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :221-231
[10]   Large-scale Video Classification with Convolutional Neural Networks [J].
Karpathy, Andrej ;
Toderici, George ;
Shetty, Sanketh ;
Leung, Thomas ;
Sukthankar, Rahul ;
Fei-Fei, Li .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1725-1732