Video Foreground Extraction Using Multi-View Receptive Field and EncoderDecoder DCNN for Traffic and Surveillance Applications

被引:58
作者
Akilan, Thangarajah [1 ]
Wu, Q. M. Jonathan [2 ]
Zhang, Wandong [2 ]
机构
[1] Lakehead Univ, Dept Comp Sci, Thunder Bay, ON P7B 5E1, Canada
[2] Univ Windsor, Dept Elect & Comp Engn, Windsor, ON N9B 3P4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Surveillance; Image segmentation; Cameras; Object segmentation; Convolutional neural networks; Background subtraction; encoder-decoder network; foreground extraction; transfer learning; CONVOLUTIONAL NEURAL-NETWORK; OPTIMIZATION; TENSOR;
D O I
10.1109/TVT.2019.2937076
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The automatic detection of foreground (FG) objects in videos is a demanding area of computer vision, with essential applications in video-based traffic analysis and surveillance. New solutions have attempted exploiting deep neural network (DNN) for this purpose. In DNN, learning agents, i.e., features for video FG object segmentation is nontrivial, unlike image segmentation. It is a temporally processed decision-making problem, where the agents involved are the spatial and temporal correlations of the FG objects and the background (BG) of the scene. To handle this and to overcome the conventional DL models poor delineation at the borders of FG regions due to fixed-view receptive filed-based learning, this work introduces a Multi-view Receptive Field Encoder-Decoder Convolutional Neural Network called MvRF-CNN. The main contribution of the model is harnessing multiple views of convolutional (conv) kernels with residual feature fusions at early, mid and late stages in an encoder-decoder (EnDec) architecture. It enhances the ability of the model to learn condition-invariant agents resulting in highly delineated FG masks when compared to the existing approaches from heuristic- to DL-based techniques. The model is trained with sequence-specific labeled samples to predict scene-specific pixel-level labels of FG objects in near static scenes with a minute dynamism. The experimental study on 37 video sequences from traffic and surveillance scenarios that include complex environments, viz. dynamic background, camera jittery, intermittent object motion, scenes with cast shadows, night videos, and lousy weather proves the effectiveness of the model. The study covers two input configurations: a 3-channel (RGB) single frame and a 3-channel double-frame with a BG such that two consecutive grayscale frames stacked with a prior BG model. The ablation investigations are also conducted to show the importance of transfer learning (TL) and mid-fusion approaches for enhancing the segmentation performance and the models robustness on failure modes: when there is lack of manually annotated hard ground truths (HGT) and testing the model under non-scene-specific videos. In overall, the model achieves a figure-of-merit of ${95\%}$ and 42 $FPS$ of mean average performance.
引用
收藏
页码:9478 / 9493
页数:16
相关论文
共 82 条
  • [1] Akilan T, 2017, IEEE SYS MAN CYBERN, P566, DOI 10.1109/SMC.2017.8122666
  • [2] A 3D CNN-LSTM-Based Image-to-Image Foreground Segmentation
    Akilan, Thangarajah
    Wu, Qingming Jonathan
    Safaei, Amin
    Huo, Jie
    Yang, Yimin
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (03) : 959 - 971
  • [3] Fusion-based foreground enhancement for background subtraction using multivariate multi-model Gaussian distribution
    Akilan, Thangarajah
    Wu, Q. M. Jonathan
    Yang, Yimin
    [J]. INFORMATION SCIENCES, 2018, 430 : 414 - 431
  • [4] C-EFIC: Color and Edge Based Foreground Background Segmentation with Interior Classification
    Allebosch, Gianni
    Van Hamme, David
    Deboeverie, Francis
    Veelaert, Peter
    Philips, Wilfried
    [J]. COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, 2016, 598 : 433 - 454
  • [5] Allen-Zhu Z, 2019, PR MACH LEARN RES, V97
  • [6] [Anonymous], 2017, INT C LEARN REPR
  • [7] [Anonymous], PROC CVPR IEEE
  • [8] [Anonymous], P ADV NEUR INF PROC
  • [9] [Anonymous], 2017, P 14 IEEE INT C ADV
  • [10] [Anonymous], 2018, P EUR C COMP VIS ECC