Real-time action detection in video surveillance using a sub-action descriptor with multi-convolutional neural networks

被引:0
作者
Jin C.-B. [1 ]
Do T.D. [1 ]
Liu M. [1 ]
Kim H. [1 ]
机构
[1] School of Information and Communication Engineering, Inha University, Incheon
关键词
Action detection; Convolutional neural network; Multi CNN; Sub-action descriptor; Video surveillance;
D O I
10.5302/J.ICROS.2018.17.0243
中图分类号
学科分类号
摘要
When we say a person is texting, can you tell the person is walking or sitting? Emphatically, no. In order to solve this incomplete representation problem, this paper presents a sub-action descriptor for detailed action detection. The sub-action descriptor consists of three levels: posture, locomotion, and gestures. The three levels provide three sub-action categories for a single action in order to address the representation problem. The proposed action detection model simultaneously localizes and recognizes the actions of multiple individuals in video surveillance using appearance-based temporal features with multi-convolutional neural networks. The proposed approach achieved a mean average precision of 76.6% for frame-based measurement and 83.5% for video-based measurement of the ICVL video surveillance dataset. Extensive experiments on the benchmark KTH dataset demonstrate that the proposed approach achieved better performance, which in turn improves action recognition performance in comparison to the stateof- the-art methods. The action detection model can run at around 25 fps with the ICVL dataset and at more than 80 fps with the KTH dataset, which is suitable for real-time surveillance applications. © ICROS 2018.
引用
收藏
页码:298 / 308
页数:10
相关论文
共 43 条
  • [1] Ricci E., Varadarajan J., Subramanian R., Rota Bulo S., Ahuja N., Lanz O., Uncovering interactions and interactors: Joint estimation of head, body orientation and F-formations from surveillance videos,, Proceedings of the IEEE International Conference on Computer Vision, pp. 4660-4668, (2015)
  • [2] Kim S.-H., Choi H.-L., Moving target tracking and recognition method for unmanned airborne surveillance systems,, Journal of Institue of Control, Robotics and Systems (in Korean), 23, 3, pp. 157-164, (2017)
  • [3] Cho S., Shim D.H., Automatic clustering for precision reconnaissance and surveillance,, Journal of Institue of Control, Robotics and Systems (in Korean), 23, 2, pp. 89-95, (2017)
  • [4] Kim I.S., Choi H.S., Yi K.M., Choi J.Y., Kong S.G., Intelligent visual surveillance-A survey,, International Journal Control, Automation and Syststems, 8, 5, pp. 926-939, (2010)
  • [5] Zhang B., Wang L., Wang Z., Qiao Y., Wang H., Real-time Action Recognition with Enhanced Motion Vector CNNs,, IEEE Conference on Computer Vision and Pattern Recognition, (2016)
  • [6] Jin C.-B., Li S., Do T.D., Kim H., Real-time human action recognition using CNN over temporal images for static video surveillance cameras,, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9315, pp. 330-339, (2015)
  • [7] Aggarwal J., Ryoo M., Human activity analysis: A review,, ACM Computing Surveys, 43, 3, pp. 161-1643, (2011)
  • [8] Bobick A.F., Davis J.W., The recognition of human movement using temporal templates,, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 3, pp. 257-267, (2001)
  • [9] Davis J.W., Bobick A.F., The representation and recognition of human movement using temporal templates,, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 23, 402, pp. 928-934, (1997)
  • [10] Dalal N., Triggs B., Histograms of oriented gradients for human detection,, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 886-893, (2005)