Progressive Self-Prompting Segment Anything Model for Salient Object Detection in Optical Remote Sensing Images

被引:0
作者
Zhang, Xiaoning [1 ,2 ]
Yu, Yi [1 ]
Li, Daqun [1 ]
Wang, Yuqing [1 ]
机构
[1] Chinese Acad Sci, Changchun Inst Opt Fine Mech & Phys, Changchun 130033, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
关键词
salient object detection; optical remote sensing images; Segment Anything Model; domain-specific prompting module; progressive self-prompting decoder module; parameter-efficient fine-tuning; COLLABORATION NETWORK; FREQUENCY; ATTENTION;
D O I
10.3390/rs17020342
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
With the continuous advancement of deep neural networks, salient object detection (SOD) in natural images has made significant progress. However, SOD in optical remote sensing images (ORSI-SOD) remains a challenging task due to the diversity of objects and the complexity of backgrounds. The primary challenge lies in generating robust features that can effectively integrate both global semantic information for salient object localization and local spatial details for boundary reconstruction. Most existing ORSI-SOD methods rely on pre-trained CNN- or Transformer-based backbones to extract features from ORSIs, followed by multi-level feature aggregation. Given the significant differences between ORSIs and the natural images used in pre-training, the generalization capability of these backbone networks is often limited, resulting in suboptimal performance. Recently, prompt engineering has been employed to enhance the generalization ability of networks in the Segment Anything Model (SAM), an emerging vision foundation model that has achieved remarkable success across various tasks. Despite its success, directly applying the SAM to ORSI-SOD without prompts from manual interaction remains unsatisfactory. In this paper, we propose a novel progressive self-prompting model based on the SAM, termed PSP-SAM, which generates both internal and external prompts to enhance the network and overcome the limitations of SAM in ORSI-SOD. Specifically, domain-specific prompting modules, consisting of both block-shared and block-specific adapters, are integrated into the network to learn domain-specific visual prompts within the backbone, facilitating its adaptation to ORSI-SOD. Furthermore, we introduce a progressive self-prompting decoder module that performs prompt-guided multi-level feature integration and generates stage-wise mask prompts progressively, enabling the prompt-based mask decoders outside the backbone to predict saliency maps in a coarse-to-fine manner. The entire network is trained end-to-end with parameter-efficient fine-tuning. Extensive experiments on three benchmark ORSI-SOD datasets demonstrate that our proposed network achieves state-of-the-art performance.
引用
收藏
页数:22
相关论文
共 90 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]  
Borji A., 2012, P IEEE COMP SOC C CO, P23
[3]   Reverse Attention-Based Residual Network for Salient Object Detection [J].
Chen, Shuhan ;
Tan, Xiuli ;
Wang, Ben ;
Lu, Huchuan ;
Hu, Xuelong ;
Fu, Yun .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :3763-3776
[4]   Discriminative saliency propagation with sink points [J].
Chen, Shuhan ;
Zheng, Ling ;
Hu, Xuelong ;
Zhou, Ping .
PATTERN RECOGNITION, 2016, 60 :2-12
[5]   SAM-Adapter: Adapting Segment Anything in Underperformed Scenes [J].
Chen, Tianrun ;
Zhu, Lanyun ;
Ding, Chaotao ;
Cao, Runlong ;
Wang, Yan ;
Zhang, Shangzhan ;
Li, Zejian ;
Sun, Lingyun ;
Zang, Ying ;
Mao, Papa .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, :3359-3367
[6]   Global Contrast Based Salient Region Detection [J].
Cheng, Ming-Ming ;
Mitra, Niloy J. ;
Huang, Xiaolei ;
Torr, Philip H. S. ;
Hu, Shi-Min .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (03) :569-582
[7]   A tutorial on the cross-entropy method [J].
De Boer, PT ;
Kroese, DP ;
Mannor, S ;
Rubinstein, RY .
ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) :19-67
[8]  
Deng ZJ, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P684
[9]   Densely nested top-down flows for salient object detection [J].
Fang, Chaowei ;
Tian, Haibin ;
Zhang, Dingwen ;
Zhang, Qiang ;
Han, Jungong ;
Han, Junwei .
SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (08)
[10]   Res2Net: A New Multi-Scale Backbone Architecture [J].
Gao, Shang-Hua ;
Cheng, Ming-Ming ;
Zhao, Kai ;
Zhang, Xin-Yu ;
Yang, Ming-Hsuan ;
Torr, Philip .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) :652-662