CNN-based encoder-decoder networks for salient object detection: A comprehensive review and recent advances

被引：185

作者：

Ji, Yuzhu ^{[1
]}

Zhang, Haijun ^{[1
]}

Zhang, Zhao ^{[2
]}

Liu, Ming ^{[3
]}

机构：

[1] Harbin Inst Technol, Dept Comp Sci, Shenzhen, Peoples R China

[2] Hefei Univ Technol, Dept Comp Sci, Hefei, Peoples R China

[3] Harbin Inst Technol, Sch Astronaut, Harbin, Peoples R China

来源：

INFORMATION SCIENCES | 2021年 / 546卷

基金：

中国国家自然科学基金;

关键词：

Salient object detection; Encoder-decoder model; Pixel-level classification; Video saliency; Empirical study;

D O I：

10.1016/j.ins.2020.09.003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Convolutional neural network (CNN)-based encoder-decoder models have profoundly inspired recent works in the field of salient object detection (SOD). With the rapid development of encoder-decoder models with respect to most pixel-level dense prediction tasks, an empirical study still does not exist that evaluates performance by applying a large body of encoder-decoder models on SOD tasks. In this paper, instead of limiting our survey to SOD methods, a broader view is further presented from the perspective of fundamental architectures of key modules and structures in CNN-based encoder-decoder models for pixel-level dense prediction tasks. Moreover, we focus on performing SOD by leveraging deep encoder-decoder models, and present an extensive empirical study on baseline encoder-decoder models in terms of different encoder backbones, loss functions, training batch sizes, and attention structures. Moreover, state-of-the-art encoder-decoder models adopted from semantic segmentation and deep CNN-based SOD models are also investigated. New baseline models that can outperform state-of-the-art performance were discovered. In addition, these newly discovered baseline models were further evaluated on three video-based SOD benchmark datasets. Experimental results demonstrate the effectiveness of these baseline models on both imageand video-based SOD tasks. This empirical study is concluded by a comprehensive summary which provides suggestions on future perspectives. (c) 2020 Elsevier Inc. All rights reserved.

引用

页码：835 / 857

页数：23

共 113 条

[1]

[Anonymous], IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2010.70

[2]

[Anonymous], 2016, From Human Attention to Computational Attention

[3]

[Anonymous], 2014, NEURAL INFORM PROCES

[4] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[5]

Bahdanau D., ABS14090473 CORR

[6] The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks [J].

Berman, Maxim ;

Triki, Amal Rannen ;

Blaschko, Matthew B. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4413-4421

[7] Salient Object Detection: A Benchmark [J].

Borji, Ali ;

Cheng, Ming-Ming ;

Jiang, Huaizu ;

Li, Jia .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5706-5722

[8] Salient object detection: A survey [J].

Borji, Ali ;

Cheng, Ming-Ming ;

Hou, Qibin ;

Jiang, Huaizu ;

Li, Jia .

COMPUTATIONAL VISUAL MEDIA, 2019, 5 (02) :117-150

[9] State-of-the-Art in Visual Attention Modeling [J].

Borji, Ali ;

Itti, Laurent .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :185-207

[10] Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion [J].

Chen, Chenglizhao ;

Li, Shuai ;

Wang, Yongguang ;

Qin, Hong ;

Hao, Aimin .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (07) :3156-3170

← 1 2 3 4 5 6 7 8 9 10 →