A deep neural network for real-time detection of falling humans in naturally occurring scenes

被引：77

作者：

Fan, Yaxiang ^{[1
]}

Levine, Martin D. ^{[2
]}

Wen, Gongjian ^{[1
]}

Qiu, Shaohua ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sci & Technol Automat Target Recognit Lab ATR, Changsha, Hunan, Peoples R China

[2] McGill Univ, Ctr Intelligent Machines, Dept Elect & Comp Engn, 3480 Univ St, Montreal, PQ, Canada

来源：

NEUROCOMPUTING | 2017年 / 260卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Fall detection; Action detection; Temporal location; Dynamic image; Convolutional neural network; Deep learning; SIMULATED DATA; SURVEILLANCE; RECOGNITION; MOTION;

D O I：

10.1016/j.neucom.2017.02.082

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce a novel approach to the problem of human fall detection in naturally occurring scenes. This is important because falling incidents cause thousands of deaths every year and vision-based approaches offer a promising and effective way to detect falls. To address this challenging issue, we regard it as an example of action detection and propose to also locate its temporal extent. We achieve this by exploiting the effectiveness of deep networks. In the training stage, the trimmed video clips of four phases (standing, falling, fallen and not moving) in a fall are converted to four categories of so-called dynamic image to train a deep ConvNet that scores and predicts the label of each dynamic image. In the testing stage, a set of sub-videos is generated using a sliding window on an untrimmed video that converts it to multiple dynamic images. Based on the predicted label of each dynamic image by the trained deep ConvNet, the videos are classified as falling or not by a "standing watch" for a situation consisting of the four sequential phases. In order to localize the temporal extent of the event, we propose a difference score method (DSM) based on adjacent dynamic images in the temporal sequence. We collect a new dataset, called the YouTube Fall Dataset (YTFD), which contains 430 falling incidents and 176 normal activities and use it to learn the deep network to detect falling humans. We perform experiments on datasets of varying complexity: Le2i fall detection dataset, multiple cameras fall dataset, high quality fall simulation dataset and our own YouTube Fall Dataset. The results demonstrate the effectiveness and efficiency of our approach. (C) 2017 Elsevier B.V. All rights reserved.

引用

页码：43 / 58

页数：16

共 56 条

[11]

[Anonymous], IEEE C NEUR INF PROC

[12]

[Anonymous], CORRABS150702159

[13]

[Anonymous], 2016, IEEE C COMP VIS PATT

[14]

[Anonymous], PREV NAT CTR INJ PRE

[15]

[Anonymous], MULTIMEDIA TOOLS APP

[16]

[Anonymous], 2016, IEEE C COMP VIS PATT

[17]

[Anonymous], 2 WORKSH WEB SCAL VI

[18]

[Anonymous], ACM P MULT C

[19]

[Anonymous], 2016, IEEE C COMP VIS PATT

[20]

[Anonymous], IEEE C COMP VIS PATT

← 1 2 3 4 5 6 →