Treasure in the background: Improve saliency object detection by self-supervised contrast learning

被引:1
作者
Dong, Haoji [1 ]
Wu, Jie [1 ]
Xing, Chengcheng [1 ]
Xi, Heran [2 ]
Cui, Hui [3 ]
Zhu, Jinghua [1 ]
机构
[1] Heilongjiang Univ, Sch Comp Sci & Technol, Harbin 150000, Peoples R China
[2] Heilongjiang Univ, Sch Elect Engn, Harbin 150001, Peoples R China
[3] La Trobe Univ, Dept Comp Sci & Informat Technol, Melbourne, 3000, Australia
关键词
Salient object detection; Contrast learning; Self-supervised learning; Vision transformer; Vision Graph Neural Network;
D O I
10.1016/j.eswa.2024.126244
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Salient object detection (SOD) is a critical task in computer vision, aimed at identifying visually striking regions within images. Existing SOD methods predict saliency maps in a supervised manner that heavily relies on labels. These methods have the following challenges: (1) Poor boundary detection when the salient objects closely resemble the backgrounds; (2) High false positives caused by more focus on objects and less on the surroundings. Therefore, it is crucial to develop better solutions to improve the comprehensiveness and precision of SOD results. Inspired by findings from a pilot study, which revealed that supervised learning tends to focus on prominent regions but neglects background information around objects, while self-supervised learning captures more comprehensive details, we introduce self-supervised contrast learning into the SOD framework. We design image-level contrast learning and pixel-level contrast learning for the SOD models with Token to Token Vision Transformer (T2T) and Vision Graph Neural Network (ViG) backbone. In fact, our approach is backbone-agnostic and can be applied as a plugin to any model. We conduct comprehensive comparison and ablation experiments on both RGB natural image datasets and medical image datasets to evaluate our method, the experimental results demonstrate that our method outperforms state-of-the-art methods consistently. Most importantly, our method not only provides a new perspective for the SOD task but also shows anew paradigm for other dense prediction tasks. Code is available at https://github.com/ msctransu/SCL_SOD.git.
引用
收藏
页数:11
相关论文
共 38 条
[1]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[2]  
Chen J., 2021, arXiv, abs/2102.04306. arXiv:2102.04306
[3]  
Chen T, 2020, PR MACH LEARN RES, V119
[4]   Exploring Simple Siamese Representation Learning [J].
Chen, Xinlei ;
He, Kaiming .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753
[5]   Learning a similarity metric discriminatively, with application to face verification [J].
Chopra, S ;
Hadsell, R ;
LeCun, Y .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :539-546
[6]  
Dosovitskiy Alexey., 2021, PROC INT C LEARN REP, P2021, DOI DOI 10.48550/ARXIV.2010.11929
[7]   Densely nested top-down flows for salient object detection [J].
Fang, Chaowei ;
Tian, Haibin ;
Zhang, Dingwen ;
Zhang, Qiang ;
Han, Jungong ;
Han, Junwei .
SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (08)
[8]   Attentive Feedback Network for Boundary-Aware Salient Object Detection [J].
Feng, Mengyang ;
Lu, Huchuan ;
Ding, Errui .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1623-1632
[9]   Local to Global Feature Learning for Salient Object Detection [J].
Feng, Xuelu ;
Zhou, Sanping ;
Zhu, Zixin ;
Wang, Le ;
Hua, Gang .
PATTERN RECOGNITION LETTERS, 2022, 162 :81-88
[10]   Momentum Contrast for Unsupervised Visual Representation Learning [J].
He, Kaiming ;
Fan, Haoqi ;
Wu, Yuxin ;
Xie, Saining ;
Girshick, Ross .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9726-9735