Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks

被引:473
作者
Fan, Deng-Ping [1 ,2 ]
Lin, Zheng [1 ]
Zhang, Zhao [1 ]
Zhu, Menglong [3 ]
Cheng, Ming-Ming [1 ]
机构
[1] Nankai Univ, Coll Comp Sci, Tianjin 300350, Peoples R China
[2] Inception Inst Artificial Intelligence IIAI, Abu Dhabi, U Arab Emirates
[3] Google AI, Mountain View, CA 94043 USA
关键词
Benchmark; RGB-D; saliency; salient object detection (SOD); Salient Person (SIP) data set; FUSION; NETWORK; CONTRAST;
D O I
10.1109/TNNLS.2020.2996406
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of RGB-D information for salient object detection (SOD) has been extensively explored in recent years. However, relatively few efforts have been put toward modeling SOD in real-world human activity scenes with RGB-D. In this article, we fill the gap by making the following contributions to RGB-D SOD: 1) we carefully collect a new Salient Person (SIP) data set that consists of similar to 1 K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusions, illuminations, and backgrounds; 2) we conduct a large-scale (and, so far, the most comprehensive) benchmark comparing contemporary methods, which has long been missing in the field and can serve as a baseline for future research, and we systematically summarize 32 popular models and evaluate 18 parts of 32 models on seven data sets containing a total of about 97k images; and 3) we propose a simple general architecture, called deep depth-depurator network (D(3)Net). It consists of a depth depurator unit (DDU) and a three-stream feature learning module (FLM), which performs low-quality depth map filtering and cross-modal feature learning, respectively. These components form a nested structure and are elaborately designed to be learned jointly. D(3)Net exceeds the performance of any prior contenders across all five metrics under consideration, thus serving as a strong model to advance research in this field. We also demonstrate that D(3)Net can be used to efficiently extract salient object masks from real scenes, enabling effective background-changing application with a speed of 65 frames/s on a single GPU. All the saliency maps, our new SIP data set, the D(3)Net model, and the evaluation tools are publicly available at https://github.com/DengPingFan/D3NetBenchmark.
引用
收藏
页码:2075 / 2089
页数:15
相关论文
共 120 条
[61]   Stereoscopic saliency model using contrast and depth-guided-background prior [J].
Liang, Fangfang ;
Duan, Lijuan ;
Ma, Wei ;
Qiao, Yuanhua ;
Cai, Zhi ;
Qing, Laiyun .
NEUROCOMPUTING, 2018, 275 :2227-2238
[62]   Feature Pyramid Networks for Object Detection [J].
Lin, Tsung-Yi ;
Dollar, Piotr ;
Girshick, Ross ;
He, Kaiming ;
Hariharan, Bharath ;
Belongie, Serge .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :936-944
[63]  
Lin Z., 2020, ARXIV200414582
[64]   SIFT Flow: Dense Correspondence across Scenes and Its Applications [J].
Liu, Ce ;
Yuen, Jenny ;
Torralba, Antonio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (05) :978-994
[65]   PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection [J].
Liu, Nian ;
Han, Junwei ;
Yang, Ming-Hsuan .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3089-3098
[66]  
Liu T., 2007, P IEEE C COMP VIS PA, P1, DOI DOI 10.1109/CVPR.2007.383047
[67]   A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection [J].
Liu, Zhengyi ;
Zhang, Wei ;
Zhao, Peng .
NEUROCOMPUTING, 2020, 387 :210-220
[68]  
Lu, 2020, P AAAI C ART INT, P1
[69]   Non-Local Deep Features for Salient Object Detection [J].
Luo, Zhiming ;
Mishra, Akshaya ;
Achkar, Andrew ;
Eichel, Justin ;
Li, Shaozi ;
Jodoin, Pierre-Marc .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6593-6601
[70]   How to Evaluate Foreground Maps? [J].
Margolin, Ran ;
Zelnik-Manor, Lihi ;
Tal, Ayellet .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :248-255