Video Foreground Extraction Using Multi-View Receptive Field and EncoderDecoder DCNN for Traffic and Surveillance Applications

被引：60

作者：

Akilan, Thangarajah ^{[1
]}

Wu, Q. M. Jonathan ^{[2
]}

Zhang, Wandong ^{[2
]}

机构：

[1] Lakehead Univ, Dept Comp Sci, Thunder Bay, ON P7B 5E1, Canada

[2] Univ Windsor, Dept Elect & Comp Engn, Windsor, ON N9B 3P4, Canada

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2019年 / 68卷 / 10期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Surveillance; Image segmentation; Cameras; Object segmentation; Convolutional neural networks; Background subtraction; encoder-decoder network; foreground extraction; transfer learning; CONVOLUTIONAL NEURAL-NETWORK; OPTIMIZATION; TENSOR;

D O I：

10.1109/TVT.2019.2937076

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The automatic detection of foreground (FG) objects in videos is a demanding area of computer vision, with essential applications in video-based traffic analysis and surveillance. New solutions have attempted exploiting deep neural network (DNN) for this purpose. In DNN, learning agents, i.e., features for video FG object segmentation is nontrivial, unlike image segmentation. It is a temporally processed decision-making problem, where the agents involved are the spatial and temporal correlations of the FG objects and the background (BG) of the scene. To handle this and to overcome the conventional DL models poor delineation at the borders of FG regions due to fixed-view receptive filed-based learning, this work introduces a Multi-view Receptive Field Encoder-Decoder Convolutional Neural Network called MvRF-CNN. The main contribution of the model is harnessing multiple views of convolutional (conv) kernels with residual feature fusions at early, mid and late stages in an encoder-decoder (EnDec) architecture. It enhances the ability of the model to learn condition-invariant agents resulting in highly delineated FG masks when compared to the existing approaches from heuristic- to DL-based techniques. The model is trained with sequence-specific labeled samples to predict scene-specific pixel-level labels of FG objects in near static scenes with a minute dynamism. The experimental study on 37 video sequences from traffic and surveillance scenarios that include complex environments, viz. dynamic background, camera jittery, intermittent object motion, scenes with cast shadows, night videos, and lousy weather proves the effectiveness of the model. The study covers two input configurations: a 3-channel (RGB) single frame and a 3-channel double-frame with a BG such that two consecutive grayscale frames stacked with a prior BG model. The ablation investigations are also conducted to show the importance of transfer learning (TL) and mid-fusion approaches for enhancing the segmentation performance and the models robustness on failure modes: when there is lack of manually annotated hard ground truths (HGT) and testing the model under non-scene-specific videos. In overall, the model achieves a figure-of-merit of ${95\%}$ and 42 $FPS$ of mean average performance.

引用

页码：9478 / 9493

页数：16

共 82 条

[11]

[Anonymous], 2016, P RSS WORKSH PLANN H

[12]

[Anonymous], 2015, PROC CVPR IEEE

[13]

[Anonymous], 2017, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2016.2644615

[14]

Arora S., 2018, PROC INT C MACH LEAR, P1

[15] A deep convolutional neural network for video sequence background subtraction [J].

Babaee, Mohammadreza ;

Duc Tung Dinh ;

Rigoll, Gerhard .

PATTERN RECOGNITION, 2018, 76 :635-649

[16]

Bian J, 2017, PROCEEDINGS OF 2017 CHINA INTERNATIONAL ELECTRICAL AND ENERGY CONFERENCE (CIEEC 2017), P1, DOI 10.1109/CIEEC.2017.8388410

[17] How Far Can You Get by Combining Change Detection Algorithms? [J].

Bianco, Simone ;

Ciocca, Gianluigi ;

Schettini, Raimondo .

IMAGE ANALYSIS AND PROCESSING,(ICIAP 2017), PT I, 2017, 10484 :96-107

[18] A Growing and Pruning Method for Radial Basis Function Networks [J].

Bortman, M. ;

Aladjem, M. .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (06) :1039-1045

[19] Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment [J].

Bosse, Sebastian ;

Maniry, Dominique ;

Mueller, Klaus-Robert ;

Wiegand, Thomas ;

Samek, Wojciech .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (01) :206-219

[20] Traditional and recent approaches in background modeling for foreground detection: An overview [J].

Bouwmans, Thierry .

COMPUTER SCIENCE REVIEW, 2014, 11-12 :31-66

← 1 2 3 4 5 6 7 8 9 →