Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks

被引：473

作者：

Fan, Deng-Ping ^{[1
,2
]}

Lin, Zheng ^{[1
]}

Zhang, Zhao ^{[1
]}

Zhu, Menglong ^{[3
]}

Cheng, Ming-Ming ^{[1
]}

机构：

[1] Nankai Univ, Coll Comp Sci, Tianjin 300350, Peoples R China

[2] Inception Inst Artificial Intelligence IIAI, Abu Dhabi, U Arab Emirates

[3] Google AI, Mountain View, CA 94043 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2021年 / 32卷 / 05期

关键词：

Benchmark; RGB-D; saliency; salient object detection (SOD); Salient Person (SIP) data set; FUSION; NETWORK; CONTRAST;

D O I：

10.1109/TNNLS.2020.2996406

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The use of RGB-D information for salient object detection (SOD) has been extensively explored in recent years. However, relatively few efforts have been put toward modeling SOD in real-world human activity scenes with RGB-D. In this article, we fill the gap by making the following contributions to RGB-D SOD: 1) we carefully collect a new Salient Person (SIP) data set that consists of similar to 1 K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusions, illuminations, and backgrounds; 2) we conduct a large-scale (and, so far, the most comprehensive) benchmark comparing contemporary methods, which has long been missing in the field and can serve as a baseline for future research, and we systematically summarize 32 popular models and evaluate 18 parts of 32 models on seven data sets containing a total of about 97k images; and 3) we propose a simple general architecture, called deep depth-depurator network (D(3)Net). It consists of a depth depurator unit (DDU) and a three-stream feature learning module (FLM), which performs low-quality depth map filtering and cross-modal feature learning, respectively. These components form a nested structure and are elaborately designed to be learned jointly. D(3)Net exceeds the performance of any prior contenders across all five metrics under consideration, thus serving as a strong model to advance research in this field. We also demonstrate that D(3)Net can be used to efficiently extract salient object masks from real scenes, enabling effective background-changing application with a speed of 65 frames/s on a single GPU. All the saliency maps, our new SIP data set, the D(3)Net model, and the evaluation tools are publicly available at https://github.com/DengPingFan/D3NetBenchmark.

引用

页码：2075 / 2089

页数：15

共 120 条

[61] Stereoscopic saliency model using contrast and depth-guided-background prior [J].

Liang, Fangfang ;

Duan, Lijuan ;

Ma, Wei ;

Qiao, Yuanhua ;

Cai, Zhi ;

Qing, Laiyun .

NEUROCOMPUTING, 2018, 275 :2227-2238

[62] Feature Pyramid Networks for Object Detection [J].

Lin, Tsung-Yi ;

Dollar, Piotr ;

Girshick, Ross ;

He, Kaiming ;

Hariharan, Bharath ;

Belongie, Serge .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :936-944

[63]

Lin Z., 2020, ARXIV200414582

[64] SIFT Flow: Dense Correspondence across Scenes and Its Applications [J].

Liu, Ce ;

Yuen, Jenny ;

Torralba, Antonio .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (05) :978-994

[65] PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection [J].

Liu, Nian ;

Han, Junwei ;

Yang, Ming-Hsuan .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3089-3098

[66]

Liu T., 2007, P IEEE C COMP VIS PA, P1, DOI DOI 10.1109/CVPR.2007.383047

[67] A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection [J].

Liu, Zhengyi ;

Zhang, Wei ;

Zhao, Peng .

NEUROCOMPUTING, 2020, 387 :210-220

[68]

Lu, 2020, P AAAI C ART INT, P1

[69] Non-Local Deep Features for Salient Object Detection [J].

Luo, Zhiming ;

Mishra, Akshaya ;

Achkar, Andrew ;

Eichel, Justin ;

Li, Shaozi ;

Jodoin, Pierre-Marc .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6593-6601

[70] How to Evaluate Foreground Maps? [J].

Margolin, Ran ;

Zelnik-Manor, Lihi ;

Tal, Ayellet .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :248-255

← 2 3 4 5 6 7 8 9 10 11 →