Human Action Recognition Using Fusion of Modern Deep Convolutional and Recurrent Neural Networks

被引:0
作者
Tkachenko, Dmytro [1 ]
机构
[1] Natl Tech Univ Ukraine, Igor Sikorsky Kyiv Polytech Inst, Inst Appl Syst Anal, Kiev, Ukraine
来源
2018 IEEE FIRST INTERNATIONAL CONFERENCE ON SYSTEM ANALYSIS & INTELLIGENT COMPUTING (SAIC) | 2018年
关键词
convolutional neural networks; human action recognition; recurrent neural networks; representation learning; video classification;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper studies the application of modern deep convolutional and recurrent neural networks to video classification, specifically human action recognition. Multi-stream architecture, which uses the ideas of representation learning to extract embeddings of multimodal features, is proposed. It is based on 2D convolutional and recurrent neural networks, and the fusion model receives a video embedding as input. Thus, the classification is performed based on this compact representation of spatial, temporal and audio information. The proposed architecture achieves 93.1% accuracy on UCF101, which is better than the results obtained with the models that have a similar architecture, and also produces representations which can be used by other models as features; anomaly detection using autoencoders is proposed as an example of this.
引用
收藏
页码:181 / 185
页数:5
相关论文
共 20 条
[1]  
[Anonymous], 2014, ADV NEURAL INFORM PR
[2]  
[Anonymous], 2017, IEEE C COMP VIS PATT
[3]  
[Anonymous], 2015, ARXIV PREPRINT ARXIV
[4]  
[Anonymous], 2017, AAAI
[5]  
[Anonymous], 2012, UCF101 DATASET 101 H
[6]  
[Anonymous], 2018, IEEE C COMP VIS PATT
[7]  
[Anonymous], 2013, EFFICIENT ESTIMATION
[8]  
[Anonymous], IEEE C AC SPEECH SIG
[9]  
[Anonymous], 2017, ABS170506950 CORR
[10]  
[Anonymous], INT C PATT REC ICPR