Transformer with large convolution kernel decoder network for salient object detection in optical remote sensing images

被引：15

作者：

Dong, Pengwei ^{[1
]}

Wang, Bo ^{[1
]}

Cong, Runmin ^{[2
]}

Sun, Hai-Han ^{[3
]}

Li, Chongyi ^{[4
]}

机构：

[1] Ningxia Univ, Sch Elect & Elect Engn, Yinchuan, Peoples R China

[2] Shandong Univ, Sch Control Sci & Engn, Shandong, Peoples R China

[3] Univ Wisconsin Madison, Dept Elect & Comp Engn, Madison, WI USA

[4] Nankai Univ, Sch Comp Sci, Tianjin, Peoples R China

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2024年 / 240卷

关键词：

Salient object detection; Optical remote sensing image; Transformer; Large convolutional kernel; ATTENTION; MODEL;

D O I：

10.1016/j.cviu.2023.103917

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite salient object detection in optical remote sensing images (ORSI-SOD) has made great strides in recent years, it is still a very challenging topic due to various scales and shapes of objects, cluttered backgrounds, and diverse imaging orientations. Most previous deep learning-based methods fails to effectively capture local and global features, resulting in ambiguous localization and semantic information and inaccurate detail and boundary prediction for ORSI-SOD. In this paper, we propose a novel Transformer with large convolutional kernel decoding network, named TLCKD-Net, which effectively models the long-range dependence that is indispensable for feature extraction of ORSI-SOD. First, we utilize Transformer backbone network to perceive global and local details of salient objects. Second, a large convolutional kernel decoding module based on self-attention mechanism is designed for different sizes of salient objects to extract feature information at different scales. Then, a large convolutional refinement and a Salient Feature Enhancement Module are used to recover and refine the saliency features to obtain high quality saliency maps. Extensive experiments on two public ORSI-SOD datasets show that our proposed method outperforms 16 state-of-the-art methods both qualitatively and quantitatively. In addition, a series of ablation studies demonstrate the effectiveness of different modules for ORSI-SOD. Our source code is publicly available at https://github.com/Dpw506/TLCKD-Net.

引用

页数：12

共 78 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2] Salient object detection: A survey [J].

Borji, Ali ;

Cheng, Ming-Ming ;

Hou, Qibin ;

Jiang, Huaizu ;

Li, Jia .

COMPUTATIONAL VISUAL MEDIA, 2019, 5 (02) :117-150

[3]

Chen Yen-Chung, 2019, BMVC, V2, P3

[4] Global Contrast based Salient Region Detection [J].

Cheng, Ming-Ming ;

Zhang, Guo-Xin ;

Mitra, Niloy J. ;

Huang, Xiaolei ;

Hu, Shi-Min .

2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, :409-416

[5] Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion [J].

Cong, Runmin ;

Lei, Jianjun ;

Zhang, Changqing ;

Huang, Qingming ;

Cao, Xiaochun ;

Hou, Chunping .

IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (06) :819-823

[6] HSCS: Hierarchical Sparsity Based Co-saliency Detection for RGBD Images [J].

Cong, Runmin ;

Lei, Jianjun ;

Fu, Huazhu ;

Huang, Qingming ;

Cao, Xiaochun ;

Ling, Nam .

IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (07) :1660-1671

[7] Video Saliency Detection via Sparsity-Based Reconstruction and Propagation [J].

Cong, Runmin ;

Lei, Jianjun ;

Fu, Huazhu ;

Porikli, Fatih ;

Huang, Qingming ;

Hou, Chunping .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (10) :4819-4831

[8]

Deng ZJ, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P684

[9] Ship Detection from Optical Remote Sensing Images Using Multi-Scale Analysis and Fourier HOG Descriptor [J].

Dong, Chao ;

Liu, Jinghong ;

Xu, Fang ;

Liu, Chenglong .

REMOTE SENSING, 2019, 11 (13)

[10] Structure-measure: A New Way to Evaluate Foreground Maps [J].

Fan, Deng-Ping ;

Cheng, Ming-Ming ;

Liu, Yun ;

Li, Tao ;

Borji, Ali .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4558-4567

← 1 2 3 4 5 6 7 8 →