A 3D CNN-LSTM-Based Image-to-Image Foreground Segmentation

被引:119
作者
Akilan, Thangarajah [1 ]
Wu, Qingming Jonathan [1 ]
Safaei, Amin [2 ]
Huo, Jie [1 ]
Yang, Yimin [3 ]
机构
[1] Univ Windsor, Dept Elect & Comp Engn, Windsor, ON N9B 3P4, Canada
[2] Toronto Micro Elect Inc, Mississauga, ON L5T 2H7, Canada
[3] Lakehead Univ, Comp Sci Dept, Thunder Bay, ON P7B 5E1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Three-dimensional displays; Solid modeling; Image segmentation; Visualization; Decoding; Encoding; Computational modeling; Deep learning; foreground-background segmentation; intelligent systems; LSTM; spatiotemporal cues; CONVOLUTIONAL NEURAL-NETWORKS; SELECTION;
D O I
10.1109/TITS.2019.2900426
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
The video-based separation of foreground (FG) and background (BG) has been widely studied due to its vital role in many applications, including intelligent transportation and video surveillance. Most of the existing algorithms are based on traditional computer vision techniques that perform pixel-level processing assuming that FG and BG possess distinct visual characteristics. Recently, state-of-the-art solutions exploit deep learning models targeted originally for image classification. Major drawbacks of such a strategy are the lacking delineation of FG regions due to missing temporal information as they segment the FG based on a single frame object detection strategy. To grapple with this issue, we excogitate a 3D convolutional neural network (3D CNN) with long short-term memory (LSTM) pipelines that harness seminal ideas, viz., fully convolutional networking, 3D transpose convolution, and residual feature flows. Thence, an FG-BG segmenter is implemented in an encoder-decoder fashion and trained on representative FG-BG segments. The model devises a strategy called double encoding and slow decoding, which fuses the learned spatio-temporal cues with appropriate feature maps both in the down-sampling and up-sampling paths for achieving well generalized FG object representation. Finally, from the Sigmoid confidence map generated by the 3D CNN-LSTM model, the FG is identified automatically by using Nobuyuki Otsu's method and an empirical global threshold. The analysis of experimental results via standard quantitative metrics on 16 benchmark datasets including both indoor and outdoor scenes validates that the proposed 3D CNN-LSTM achieves competitive performance in terms of figure of merit evaluated against prior and state-of-the-art methods. Besides, a failure analysis is conducted on 20 video sequences from the DAVIS 2016 dataset.
引用
收藏
页码:959 / 971
页数:13
相关论文
共 69 条
[1]  
Akilan T, 2017, IEEE SYS MAN CYBERN, P566, DOI 10.1109/SMC.2017.8122666
[2]   Double Encoding - Slow Decoding Image to Image CNN for Foreground Identification with Application Towards Intelligent Transportation [J].
Akilan, Thangarajah ;
Wu, Jonathan .
IEEE 2018 INTERNATIONAL CONGRESS ON CYBERMATICS / 2018 IEEE CONFERENCES ON INTERNET OF THINGS, GREEN COMPUTING AND COMMUNICATIONS, CYBER, PHYSICAL AND SOCIAL COMPUTING, SMART DATA, BLOCKCHAIN, COMPUTER AND INFORMATION TECHNOLOGY, 2018, :395-403
[3]  
Akilan T, 2018, MIDWEST SYMP CIRCUIT, P889, DOI 10.1109/MWSCAS.2018.8623825
[4]   Fusion-based foreground enhancement for background subtraction using multivariate multi-model Gaussian distribution [J].
Akilan, Thangarajah ;
Wu, Q. M. Jonathan ;
Yang, Yimin .
INFORMATION SCIENCES, 2018, 430 :414-431
[5]   Road Scene Content Analysis for Driver Assistance and Autonomous Driving [J].
Altun, Melih ;
Celenk, Mehmet .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2017, 18 (12) :3398-3407
[6]  
[Anonymous], 2016, P RSS WORKSH PLANN H
[7]  
[Anonymous], 2000, P EUR C COMP VIS
[8]  
[Anonymous], 1997, Neural Computation
[9]   A deep convolutional neural network for video sequence background subtraction [J].
Babaee, Mohammadreza ;
Duc Tung Dinh ;
Rigoll, Gerhard .
PATTERN RECOGNITION, 2018, 76 :635-649
[10]   Subspace-based background subtraction applied to aeroacoustic wind tunnel testing [J].
Bahr, Christopher J. ;
Horne, William C. .
INTERNATIONAL JOURNAL OF AEROACOUSTICS, 2017, 16 (4-5) :299-325