Aggregating transformers and CNNs for salient object detection in optical remote sensing images

被引:21
作者
Bao, Liuxin [1 ]
Zhou, Xiaofei [1 ]
Zheng, Bolun [1 ]
Yin, Haibing [2 ,3 ]
Zhu, Zunjie [2 ,3 ]
Zhang, Jiyong [1 ]
Yan, Chenggang [1 ,2 ]
机构
[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China
[2] Hangzhou Dianzi Univ, Lishui Inst, Lishui 323000, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Commun Engn, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformer; CNNs; Feature fusion; Optical RSIs; Salient object detection; ENCODER-DECODER NETWORK; ATTENTION; FEATURES;
D O I
10.1016/j.neucom.2023.126560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Salient object detection (SOD) in optical remote sensing images (RSIs) plays a significant role in many areas such as agriculture, environmental protection, and the military. However, since the difference in imaging mode and image complexity between RSIs and natural scene images (NSIs), it is difficult to achieve remarkable results by directly extending the saliency method targeting NSIs to RSIs. Besides, we note that the convolutional neural networks (CNNs) based U-Net cannot effectively acquire the global long-range dependency, and the Transformer doesn't adequately characterize the spatial local details of each patch. Therefore, to conduct salient object detection in RSIs, we propose a novel two-branch architecture based network for Aggregating the Transformers and CNNs, namely ATC-Net, where the local spatial details and the global semantic information are fused into the final high-quality saliency map. Specifically, our saliency model adopts an encoder-decoder architecture including two parallel encoder branches and a decoder. Firstly, the two parallel encoder branches extract global and local features by using Transformer and CNNs, respectively. Then, the decoder employs a series of featureenhanced fusion (FF) modules to aggregate multi-level global and local features by interactive guidance and enhance the fused feature via attention mechanism. Finally, the decoder deploys the read out (RO) module to fuse the aggregated feature of FF module and the low-level CNN feature, steering the feature to focus more on spatial local details. Extensive experiments are performed on two public optical RSIs datasets, and the results show that our saliency model consistently outperforms 30 state-of-the-art methods.
引用
收藏
页数:14
相关论文
共 100 条
  • [81] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
    Yuan, Li
    Chen, Yunpeng
    Wang, Tao
    Yu, Weihao
    Shi, Yujun
    Jiang, Zihang
    Tay, Francis E. H.
    Feng, Jiashi
    Yan, Shuicheng
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 538 - 547
  • [82] Reversion Correction and Regularized Random Walk Ranking for Saliency Detection
    Yuan, Yuchen
    Li, Changyang
    Kim, Jinman
    Cai, Weidong
    Feng, David Dagan
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1311 - 1322
  • [83] NLFFTNet: A non-local feature fusion transformer network for multi-scale object detection
    Zeng, Kai
    Ma, Qian
    Wu, Jiawen
    Xiang, Sijia
    Shen, Tao
    Zhang, Lei
    [J]. NEUROCOMPUTING, 2022, 493 : 15 - 27
  • [84] Saliency detection based on self-adaptive multiple feature fusion for remote sensing images
    Zhang, Libao
    Liu, Yanan
    Zhang, Jue
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2019, 40 (22) : 8270 - 8297
  • [85] Zhang LB, 2018, IEEE IMAGE PROC, P2336, DOI 10.1109/ICIP.2018.8451210
  • [86] Zhang LB, 2015, INT GEOSCI REMOTE SE, P1877, DOI 10.1109/IGARSS.2015.7326159
  • [87] Online object tracking based on CNN with spatial-temporal saliency guided sampling
    Zhang, Peng
    Zhuo, Tao
    Huang, Wei
    Chen, Kangli
    Kankanhalli, Mohan
    [J]. NEUROCOMPUTING, 2017, 257 : 115 - 127
  • [88] Dense Attention Fluid Network for Salient Object Detection in Optical Remote Sensing Images
    Zhang, Qijian
    Cong, Runmin
    Li, Chongyi
    Cheng, Ming-Ming
    Fang, Yuming
    Cao, Xiaochun
    Zhao, Yao
    Kwong, Sam
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1305 - 1317
  • [89] Airport Extraction via Complementary Saliency Analysis and Saliency-Oriented Active Contour Model
    Zhang, Qijian
    Zhang, Libao
    Shi, Wenqi
    Liu, Yue
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2018, 15 (07) : 1085 - 1089
  • [90] Zhang Y., 2021, LECT NOTES COMPUT SC, P14, DOI [DOI 10.1007/978-3-030-87193-2_2, 10.1007/978-3-030-87193-22]