Category-Aware Saliency Enhance Learning Based on CLIP for Weakly Supervised Salient Object Detection

被引：2

作者：

Zhang, Yunde ^{[1
]}

Zhang, Zhili ^{[2
]}

Liu, Tianshan ^{[3
]}

Kong, Jun ^{[1
]}

机构：

[1] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China

[2] Anhui Univ, Sch Comp Sci & Technol, Hefei, Anhui, Peoples R China

[3] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong 999077, Peoples R China

来源：

NEURAL PROCESSING LETTERS | 2024年 / 56卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Weakly supervised; Salient object detection; Category-aware Saliency Enhance Learning; CLIP;

D O I：

10.1007/s11063-024-11530-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly supervised salient object detection (SOD) using image-level category labels has been proposed to reduce the annotation cost of pixel-level labels. However, existing methods mostly train a classification network to generate a class activation map, which suffers from coarse localization and difficult pseudo-label updating. To address these issues, we propose a novel Category-aware Saliency Enhance Learning (CSEL) method based on contrastive vision-language pre-training (CLIP), which can perform image-text classification and pseudo-label updating simultaneously. Our proposed method transforms image-text classification into pixel-text matching and generates a category-aware saliency map, which is evaluated by the classification accuracy. Moreover, CSEL assesses the quality of the category-aware saliency map and the pseudo saliency map, and uses the quality confidence scores as weights to update the pseudo labels. The two maps mutually enhance each other to guide the pseudo saliency map in the correct direction. Our SOD network can be trained jointly under the supervision of the updated pseudo saliency maps. We test our model on various well-known RGB-D and RGB SOD datasets. Our model achieves an S-measure of 87.6%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} on the RGB-D NLPR dataset and 84.3%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} on the RGB ECSSD dataset. Additionally, we obtain satisfactory performance on the weakly supervised E-measure, F-measure, and mean absolute error metrics for other datasets. These results demonstrate the effectiveness of our model.

引用

页数：22

共 68 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2] Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly Supervised Semantic Segmentation [J].

Chen, Tao ;

Yao, Yazhou ;

Zhang, Lei ;

Wang, Qiong ;

Xie, Guo-Sen ;

Shen, Fumin .

IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :1727-1737

[3] Semantic Image Segmentation with Feature Fusion Based on Laplacian Pyramid [J].

Chen, Yongsheng .

NEURAL PROCESSING LETTERS, 2022, 54 (05) :4153-4170

[4]

Cheng B, 2021, ADV NEUR IN, V34

[5]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[6] BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network [J].

Fan, Deng-Ping ;

Zhai, Yingjie ;

Borji, Ali ;

Yang, Jufeng ;

Shao, Ling .

COMPUTER VISION - ECCV 2020, PT XII, 2020, 12357 :275-292

[7] Structure-measure: A New Way to Evaluate Foreground Maps [J].

Fan, Deng-Ping ;

Cheng, Ming-Ming ;

Liu, Yun ;

Li, Tao ;

Borji, Ali .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4558-4567

[8]

Fan DP, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P698

[9] Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks [J].

Fan, Deng-Ping ;

Lin, Zheng ;

Zhang, Zhao ;

Zhu, Menglong ;

Cheng, Ming-Ming .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) :2075-2089

[10] SALIENCY-DRIVEN VERSATILE VIDEO CODING FOR NEURAL OBJECT DETECTION [J].

Fischer, Kristian ;

Fleckenstein, Felix ;

Herglotz, Christian ;

Kaup, Andre .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :1505-1509

← 1 2 3 4 5 6 7 →