Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models

被引：32

作者：

AlDahoul, Nouar ^{[1
]}

Sabri, Aznul Qalid Md ^{[1
]}

Mansoor, Ali Mohammed ^{[1
]}

机构：

[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur, Malaysia

来源：

COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE | 2018年 / 2018卷

关键词：

Computer graphics equipment - Cameras - Support vector machines - Neural networks - Graphics processing unit - Program processors;

D O I：

10.1155/2018/1639561

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Human detection in videos plays an important role in various real life applications. Most of traditional approaches depend on utilizing handcrafted features which are problem-dependent and optimal for specific tasks. Moreover, they are highly susceptible to dynamical events such as illumination changes, camera jitter, and variations in object sizes. On the other hand, the proposed feature learning approaches are cheaper and easier because highly abstract and discriminative features can be produced automatically without the need of expert knowledge. In this paper, we utilize automatic feature learning methods which combine optical flow and three different deep models (i.e., supervised convolutional neural network (S-CNN), pretrained CNN feature extractor, and hierarchical extreme learning machine) for human detection in videos captured using a nonstatic camera on an aerial platform with varying altitudes. The models are trained and tested on the publicly available and highly challenging UCF-ARG aerial dataset. The comparison between these models in terms of training, testing accuracy, and learning speed is analyzed. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running). Experimental results demonstrated that the proposed methods are successful for human detection task. Pretrained CNN produces an average accuracy of 98.09%. S-CNN produces an average accuracy of 95.6% with soft-max and 91.7% with Support Vector Machines (SVM). H-ELM has an average accuracy of 95.9%. Using a normal Central Processing Unit (CPU), H-ELM's training time takes 445 seconds. Learning in S-CNN takes 770 seconds with a high performance Graphical Processing Unit (GPU).

引用

页数：14

共 28 条

[1]

[Anonymous], 2015, P BRIT MACH VIS C BM

[2]

Baccouche Moez, 2011, Human Behavior Unterstanding. Proceedings Second International Workshop, HBU 2011, P29, DOI 10.1007/978-3-642-25446-8_4

[3] PERFORMANCE OF OPTICAL-FLOW TECHNIQUES [J].

BARRON, JL ;

FLEET, DJ ;

BEAUCHEMIN, SS .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 1994, 12 (01) :43-77

[4]

Bishop Christopher M, 2016, Pattern recognition and machine learning

[5]

Casamitjana A., 2016, P MICCAI CHALL MULT

[6]

Cohen I., 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), P319, DOI 10.1109/CVPR.1999.784651

[7]

Collobert R., 2008, P 25 ICML, P160, DOI [DOI 10.1145/1390156.1390177, 10.1145/1390156.1390177]

[8] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10]

Dollar P., 2010, BMVC 2010, DOI DOI 10.5244/C.24.68

← 1 2 3 →