Category-Aware Saliency Enhance Learning Based on CLIP for Weakly Supervised Salient Object Detection

被引:0
作者
Yunde Zhang
Zhili Zhang
Tianshan Liu
Jun Kong
机构
[1] Jiangnan University,Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education)
[2] Anhui University,School of Computer Science and Technology
[3] The Hong Kong Polytechnic University,Department of Electronic and Information Engineering
来源
Neural Processing Letters | / 56卷
关键词
Weakly supervised; Salient object detection; Category-aware Saliency Enhance Learning; CLIP;
D O I
暂无
中图分类号
学科分类号
摘要
Weakly supervised salient object detection (SOD) using image-level category labels has been proposed to reduce the annotation cost of pixel-level labels. However, existing methods mostly train a classification network to generate a class activation map, which suffers from coarse localization and difficult pseudo-label updating. To address these issues, we propose a novel Category-aware Saliency Enhance Learning (CSEL) method based on contrastive vision-language pre-training (CLIP), which can perform image-text classification and pseudo-label updating simultaneously. Our proposed method transforms image-text classification into pixel-text matching and generates a category-aware saliency map, which is evaluated by the classification accuracy. Moreover, CSEL assesses the quality of the category-aware saliency map and the pseudo saliency map, and uses the quality confidence scores as weights to update the pseudo labels. The two maps mutually enhance each other to guide the pseudo saliency map in the correct direction. Our SOD network can be trained jointly under the supervision of the updated pseudo saliency maps. We test our model on various well-known RGB-D and RGB SOD datasets. Our model achieves an S-measure of 87.6%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} on the RGB-D NLPR dataset and 84.3%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} on the RGB ECSSD dataset. Additionally, we obtain satisfactory performance on the weakly supervised E-measure, F-measure, and mean absolute error metrics for other datasets. These results demonstrate the effectiveness of our model.
引用
收藏
相关论文
共 97 条
[1]  
Zhao Z(2023)Depth enhanced cross-modal cascaded network for rgb-d salient object detection Neural Process Lett 55 361-384
[2]  
Huang Z(2017)A two-stage bayesian integration framework for salient object detection on light field Neural Process Lett 46 1083-1094
[3]  
Chai X(2022)Saliency guided inter-and intra-class relation constraints for weakly supervised semantic segmentation IEEE Trans Multimed 25 1727-1737
[4]  
Wang J(2019)Language-aware weak supervision for salient object detection Pattern Recogn 96 4423-4435
[5]  
Wang A(2021)Weakly-supervised salient object detection with saliency bounding boxes IEEE Trans Image Process 30 4370-4380
[6]  
Wang M(2021)Weakly-supervised saliency detection via salient object subitizing IEEE Trans Circuits Syst Video Technol 31 2888-2897
[7]  
Li X(2022)Noise-sensitive adversarial learning for weakly supervised salient object detection IEEE Trans Multimed 25 1-9
[8]  
Mi Z(2011)Efficient inference in fully connected CRFs with Gaussian edge potentials Adv Neural Inf Process Syst 24 1-15
[9]  
Zhou H(2021)Joint semantic mining for weakly supervised RGB-D salient object detection Adv Neural Inf Process Syst 34 2148-2161
[10]  
Chen T(2022)Weakly supervised RGB-D salient object detection with prediction consistency training and active scribble boosting IEEE Trans Image Process 31 1-15