Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras

被引:22
作者
Jin, Cheng-Bin [1 ]
Li, Shengzhe [1 ]
Trung Dung Do [1 ]
Kim, Hakil [1 ]
机构
[1] Inha Univ, Informat & Commun Engn, Inchon, South Korea
来源
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2015, PT II | 2015年 / 9315卷
关键词
Video surveillance; Action recognition; Temporal images; Convolutional neural network; Hierarchical action structure;
D O I
10.1007/978-3-319-24078-7_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a real-time human action recognition approach to static video surveillance systems. This approach predicts human actions using temporal images and convolutional neural networks (CNN). CNN is a type of deep learning model that can automatically learn features from training videos. Although the state-of-the-art methods have shown high accuracy, they consume a lot of computational resources. Another problem is that many methods assume that exact knowledge of human positions. Moreover, most of the current methods build complex handcrafted features for specific classifiers. Therefore, these kinds of methods are difficult to apply in real-world applications. In this paper, a novel CNN model based on temporal images and a hierarchical action structure is developed for real-time human action recognition. The hierarchical action structure includes three levels: action layer, motion layer, and posture layer. The top layer represents subtle actions; the bottom layer represents posture. Each layer contains one CNN, which means that this model has three CNNs working together; layers are combined to represent many different kinds of action with a large degree of freedom. The developed approach was implemented and achieved superior performance for the ICVL action dataset; the algorithm can run at around 20 frames per second.
引用
收藏
页码:330 / 339
页数:10
相关论文
共 19 条
[1]  
[Anonymous], 2013, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2012.59
[2]  
[Anonymous], IEEE I CONF COMP VIS
[3]  
[Anonymous], 2017, COMMUN ACM, DOI [DOI 10.1145/3065386, 10.1145/3065386]
[4]   The recognition of human movement using temporal templates [J].
Bobick, AF ;
Davis, JW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) :257-267
[5]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[6]   The representation and recognition of human movement using temporal templates [J].
Davis, JW ;
Bobick, AF .
1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, :928-934
[7]  
Felzenszwalb P., 2008, 2008 IEEE C COMPUTER, P1
[8]   A unified tree-based framework for joint action localization, recognition and segmentation [J].
Jiang, Zhuolin ;
Lin, Zhe ;
Davis, Larry S. .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (10) :1345-1355
[9]  
Kim Ilseo., 2013, Proceedings of the 21st ACM international conference on Multimedia, P637
[10]   Discriminative Latent Models for Recognizing Contextual Group Activities [J].
Lan, Tian ;
Wang, Yang ;
Yang, Weilong ;
Robinovitch, Stephen N. ;
Mori, Greg .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (08) :1549-1562