Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks

被引：473

作者：

Fan, Deng-Ping ^{[1
,2
]}

Lin, Zheng ^{[1
]}

Zhang, Zhao ^{[1
]}

Zhu, Menglong ^{[3
]}

Cheng, Ming-Ming ^{[1
]}

机构：

[1] Nankai Univ, Coll Comp Sci, Tianjin 300350, Peoples R China

[2] Inception Inst Artificial Intelligence IIAI, Abu Dhabi, U Arab Emirates

[3] Google AI, Mountain View, CA 94043 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2021年 / 32卷 / 05期

关键词：

Benchmark; RGB-D; saliency; salient object detection (SOD); Salient Person (SIP) data set; FUSION; NETWORK; CONTRAST;

D O I：

10.1109/TNNLS.2020.2996406

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The use of RGB-D information for salient object detection (SOD) has been extensively explored in recent years. However, relatively few efforts have been put toward modeling SOD in real-world human activity scenes with RGB-D. In this article, we fill the gap by making the following contributions to RGB-D SOD: 1) we carefully collect a new Salient Person (SIP) data set that consists of similar to 1 K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusions, illuminations, and backgrounds; 2) we conduct a large-scale (and, so far, the most comprehensive) benchmark comparing contemporary methods, which has long been missing in the field and can serve as a baseline for future research, and we systematically summarize 32 popular models and evaluate 18 parts of 32 models on seven data sets containing a total of about 97k images; and 3) we propose a simple general architecture, called deep depth-depurator network (D(3)Net). It consists of a depth depurator unit (DDU) and a three-stream feature learning module (FLM), which performs low-quality depth map filtering and cross-modal feature learning, respectively. These components form a nested structure and are elaborately designed to be learned jointly. D(3)Net exceeds the performance of any prior contenders across all five metrics under consideration, thus serving as a strong model to advance research in this field. We also demonstrate that D(3)Net can be used to efficiently extract salient object masks from real scenes, enabling effective background-changing application with a speed of 65 frames/s on a single GPU. All the saliency maps, our new SIP data set, the D(3)Net model, and the evaluation tools are publicly available at https://github.com/DengPingFan/D3NetBenchmark.

引用

页码：2075 / 2089

页数：15

共 120 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2]

Alpert S, 2007, PROC CVPR IEEE, P359

[3]

Amirul M. A., 2017, PROC BRIT MACH VIS C, P1

[4]

[Anonymous], 2017, IEEE INT SYMP ELEC

[5] Salient Object Detection: A Benchmark [J].

Borji, Ali ;

Cheng, Ming-Ming ;

Jiang, Huaizu ;

Li, Jia .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5706-5722

[6] Salient object detection: A survey [J].

Borji, Ali ;

Cheng, Ming-Ming ;

Hou, Qibin ;

Jiang, Huaizu ;

Li, Jia .

COMPUTATIONAL VISUAL MEDIA, 2019, 5 (02) :117-150

[7] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[8] Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].

Chen, Hao ;

Li, Youfu .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835

[9]

Chen H, 2018, IEEE INT C INT ROBOT, P6821, DOI 10.1109/IROS.2018.8594373

[10] Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection [J].

Chen, Hao ;

Li, Youfu .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3051-3060

← 1 2 3 4 5 6 7 8 9 10 →