Multi-stream 3D CNN structure for human action recognition trained by limited data

被引:26
作者
Chenarlogh, Vahid Ashkani [1 ]
Razzazi, Farbod [1 ]
机构
[1] Islamic Azad Univ, Sci & Res Branch, Dept Elect & Comp Engn, Tehran, Iran
关键词
object recognition; image motion analysis; image classification; cameras; feature extraction; learning (artificial intelligence); video signal processing; image sequences; convolutional neural nets; multistream 3D CNN structure; human action recognition; training performance; training data case; optical flows; vertical directions; three-dimensional CNNs; four-stream 3D CNNs; single-stream model; two-stream architecture; four-stream architecture; information channels; separate streams; action recognition system; data set; four-stream structure; convolutional neural network architectures; optical flow; recognition rate; IXMAS; FEATURES;
D O I
10.1049/iet-cvi.2018.5088
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here, the authors proposed a solution to improve the training performance in limited training data case for human action recognition. The authors proposed three different convolutional neural network (CNN) architectures for this purpose. At first, the authors generated four different channels of information by optical flows and gradients in the horizontal and vertical directions from each frame to apply to three-dimensional (3D) CNNs. Then, the authors proposed three architectures, which are single-stream, two-stream, and four-stream 3D CNNs. In the single-stream model, the authors applied four channels of information from each frame to a single stream. In the two-stream architecture, the authors applied optical flow-x and optical flow-y into one stream and gradient-x and gradient-y to another stream. In the four-stream architecture, the authors applied each one of the information channels to four separate streams. Evaluating the architectures in an action recognition system, the system was assessed on IXMAS, a data set which has been recorded simultaneously by five cameras. The authors showed that the results of four-stream architecture were better than other architectures, achieving 87.5, 91.66, 91.11, 88.05, and 81.94% recognition rates for cameras 0-4, respectively, using four-stream structure (88.05% recognition rate in average).
引用
收藏
页码:338 / 344
页数:7
相关论文
共 34 条
[31]  
Wu XX, 2011, PROC CVPR IEEE, P489, DOI 10.1109/CVPR.2011.5995624
[32]   Analysis of Air-Gap Field Modulation and Magnetic Gearing Effects in Switched Flux Permanent Magnet Machines [J].
Wu, Z. Z. ;
Zhu, Z. Q. .
IEEE TRANSACTIONS ON MAGNETICS, 2015, 51 (05)
[33]   Real-time Action Recognition with Enhanced Motion Vector CNNs [J].
Zhang, Bowen ;
Wang, Limin ;
Wang, Zhe ;
Qiao, Yu ;
Wang, Hanli .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2718-2726
[34]   Embedding Motion and Structure Features for Action Recognition [J].
Zhen, Xiantong ;
Shao, Ling ;
Tao, Dacheng ;
Li, Xuelong .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2013, 23 (07) :1182-1190