Spatiotemporal neural networks for action recognition based on joint loss

被引:18
作者
Jing, Chao [1 ]
Wei, Ping [1 ]
Sun, Hongbin [1 ]
Zheng, Nanning [1 ]
机构
[1] Xi An Jiao Tong Univ, Xian, Shaanxi, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Action recognition; Spatiotemporal architecture; LSTM; Joint loss; HISTOGRAMS; ENSEMBLE; FLOW;
D O I
10.1007/s00521-019-04615-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition is a challenging and important problem in a myriad of significant fields, such as intelligent robots and video surveillance. In recent years, deep learning and neural network techniques have been widely applied to action recognition and attained remarkable results. However, it is still a difficult task to recognize actions in complicated scenes, such as various illumination conditions, similar motions, and background noise. In this paper, we present a spatiotemporal neural network model with a joint loss to recognize human actions from videos. This spatiotemporal neural network is comprised of two key connected substructures. The first one is a two-stream-based network extracting optical flow and appearance features from each frame of videos, which characterizes the human actions of videos in spatial dimension. The second substructure is a group of Long Short-Term Memory structures following the spatial network, which describes the temporal and transition information in videos. This research effort presents a joint loss function for training the spatiotemporal neural network model. By introducing the loss function, the action recognition performance is improved. The proposed method was tested with video samples from two challenging datasets. The experiments demonstrate that our approach outperforms the baseline comparison methods.
引用
收藏
页码:4293 / 4302
页数:10
相关论文
共 47 条
[1]  
[Anonymous], 2017, 2017 IEEE INT C ROB
[2]  
Arunnehru J., 2015, Mining Intelligence and Knowledge Exploration. Third International Conference, MIKE 2015. Proceedings: LNCS 9468, P460, DOI 10.1007/978-3-319-26832-3_43
[3]  
Baccouche Moez, 2011, Human Behavior Unterstanding. Proceedings Second International Workshop, HBU 2011, P29, DOI 10.1007/978-3-642-25446-8_4
[4]   Productivity effects of IT-outsourcing: Semiparametric evidence for German companies [J].
Bertschek, Irene ;
Mueller, Marlene .
ART OF SEMIPARAMETRICS, 2006, :130-+
[5]  
Chao Li, 2017, 2017 IEEE International Conference on Multimedia and Expo: Workshops (ICMEW), P609, DOI 10.1109/ICMEW.2017.8026281
[6]  
Chaudhry R, 2009, PROC CVPR IEEE, P1932, DOI 10.1109/CVPRW.2009.5206821
[7]   E-LEARNING - A KEY TOOL IN TODAY'S ROMANIAN HIGHER EDUCATION [J].
Chirimbu, Sebastian ;
Barbu, Adina .
LEVERAGING TECHNOLOGY FOR LEARNING, VOL I, 2012, :104-109
[8]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[9]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[10]  
Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878