Transformer with large convolution kernel decoder network for salient object detection in optical remote sensing images

被引:15
作者
Dong, Pengwei [1 ]
Wang, Bo [1 ]
Cong, Runmin [2 ]
Sun, Hai-Han [3 ]
Li, Chongyi [4 ]
机构
[1] Ningxia Univ, Sch Elect & Elect Engn, Yinchuan, Peoples R China
[2] Shandong Univ, Sch Control Sci & Engn, Shandong, Peoples R China
[3] Univ Wisconsin Madison, Dept Elect & Comp Engn, Madison, WI USA
[4] Nankai Univ, Sch Comp Sci, Tianjin, Peoples R China
关键词
Salient object detection; Optical remote sensing image; Transformer; Large convolutional kernel; ATTENTION; MODEL;
D O I
10.1016/j.cviu.2023.103917
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite salient object detection in optical remote sensing images (ORSI-SOD) has made great strides in recent years, it is still a very challenging topic due to various scales and shapes of objects, cluttered backgrounds, and diverse imaging orientations. Most previous deep learning-based methods fails to effectively capture local and global features, resulting in ambiguous localization and semantic information and inaccurate detail and boundary prediction for ORSI-SOD. In this paper, we propose a novel Transformer with large convolutional kernel decoding network, named TLCKD-Net, which effectively models the long-range dependence that is indispensable for feature extraction of ORSI-SOD. First, we utilize Transformer backbone network to perceive global and local details of salient objects. Second, a large convolutional kernel decoding module based on self-attention mechanism is designed for different sizes of salient objects to extract feature information at different scales. Then, a large convolutional refinement and a Salient Feature Enhancement Module are used to recover and refine the saliency features to obtain high quality saliency maps. Extensive experiments on two public ORSI-SOD datasets show that our proposed method outperforms 16 state-of-the-art methods both qualitatively and quantitatively. In addition, a series of ablation studies demonstrate the effectiveness of different modules for ORSI-SOD. Our source code is publicly available at https://github.com/Dpw506/TLCKD-Net.
引用
收藏
页数:12
相关论文
共 78 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]   Salient object detection: A survey [J].
Borji, Ali ;
Cheng, Ming-Ming ;
Hou, Qibin ;
Jiang, Huaizu ;
Li, Jia .
COMPUTATIONAL VISUAL MEDIA, 2019, 5 (02) :117-150
[3]  
Chen Yen-Chung, 2019, BMVC, V2, P3
[4]   Global Contrast based Salient Region Detection [J].
Cheng, Ming-Ming ;
Zhang, Guo-Xin ;
Mitra, Niloy J. ;
Huang, Xiaolei ;
Hu, Shi-Min .
2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, :409-416
[5]   Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion [J].
Cong, Runmin ;
Lei, Jianjun ;
Zhang, Changqing ;
Huang, Qingming ;
Cao, Xiaochun ;
Hou, Chunping .
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (06) :819-823
[6]   HSCS: Hierarchical Sparsity Based Co-saliency Detection for RGBD Images [J].
Cong, Runmin ;
Lei, Jianjun ;
Fu, Huazhu ;
Huang, Qingming ;
Cao, Xiaochun ;
Ling, Nam .
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (07) :1660-1671
[7]   Video Saliency Detection via Sparsity-Based Reconstruction and Propagation [J].
Cong, Runmin ;
Lei, Jianjun ;
Fu, Huazhu ;
Porikli, Fatih ;
Huang, Qingming ;
Hou, Chunping .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (10) :4819-4831
[8]  
Deng ZJ, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P684
[9]   Ship Detection from Optical Remote Sensing Images Using Multi-Scale Analysis and Fourier HOG Descriptor [J].
Dong, Chao ;
Liu, Jinghong ;
Xu, Fang ;
Liu, Chenglong .
REMOTE SENSING, 2019, 11 (13)
[10]   Structure-measure: A New Way to Evaluate Foreground Maps [J].
Fan, Deng-Ping ;
Cheng, Ming-Ming ;
Liu, Yun ;
Li, Tao ;
Borji, Ali .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4558-4567