Semantic Segmentation of Remote Sensing Images by Interactive Representation Refinement and Geometric Prior-Guided Inference

被引:60
作者
Li, Xin [1 ,2 ]
Xu, Feng [1 ,2 ,3 ]
Liu, Fan [1 ,2 ]
Tong, Yao [4 ]
Lyu, Xin [1 ,2 ]
Zhou, Jun [5 ]
机构
[1] Hohai Univ, Coll Comp & Informat, Nanjing 211100, Peoples R China
[2] Hohai Univ, Key Lab Water Big Data Technol, Minist Water Resources, Nanjing 211100, Peoples R China
[3] Jiangsu Ocean Univ, Sch Comp Engn, Lianyungang 222005, Peoples R China
[4] Nanjing Univ Chinese Med, Sch Artificial Intelligence & Informat Technol, Nanjing 210023, Peoples R China
[5] Griffith Univ, Sch Informat & Commun Technol, Nathan, Qld 4111, Australia
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
关键词
Attention bias; contextual affinity; remote sensing images (RSIs); semantic segmentation; synergistic attention; NETWORKS;
D O I
10.1109/TGRS.2023.3339291
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
High spatial resolution remote sensing images (HRRSIs) contain intricate details and varied spectral distributions, making their semantic segmentation a challenging task. To address this problem, it is crucial to adequately capture both local and global contexts to reduce semantic ambiguity. While self-attention modules in vision transformers capture long-range context, they tend to sacrifice local details. In this article, we propose a geometric prior-guided interactive network (GPINet), a hybrid network that refines features across encoder and decoder stages. First of all, a dual branch structure encoder with local-global interaction modules (LGIMs) is designed to fully exploit local and global contexts for feature refinement. Unlike commonly used skip connections or concatenations, the LGIMs bilaterally couple and exchange CNN features with transformer features by lossless transformation and elaborating cross-attention. Moreover, we introduce a geometric prior generation module (GPGM) that iteratively updates the randomly initialized geometric prior. Subsequently, the geometric priors are stored and used to guide feature recovery. Finally, a weighted summation is applied to the upsampled decoded features and geometric priors. By comprehensively capturing contexts and enabling lossless decoding and deterministic inference, GPINet allows the network to learn discriminative representations for accurately specifying pixel-level semantics. Experiments on three benchmark datasets demonstrate the superiority of the proposed GPINet over state-of-the-art methods. Furthermore, we validate the effectiveness of geometric priors and compare the model sizes.
引用
收藏
页码:1 / 18
页数:18
相关论文
共 64 条
[1]   Transformers in Remote Sensing: A Survey [J].
Aleissaee, Abdulaziz Amer ;
Kumar, Amandeep ;
Anwer, Rao Muhammad ;
Khan, Salman ;
Cholakkal, Hisham ;
Xia, Gui-Song ;
Khan, Fahad Shahbaz .
REMOTE SENSING, 2023, 15 (07)
[2]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[3]   Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification [J].
Bi, Meiqiao ;
Wang, Minghua ;
Li, Zhi ;
Hong, Danfeng .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 :738-749
[4]   A General Survey on Attention Mechanisms in Deep Learning [J].
Brauwers, Gianni ;
Frasincar, Flavius .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) :3279-3298
[5]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[6]   Attention, please! A survey of neural attention models in deep learning [J].
Correia, Alana de Santana ;
Colombini, Esther Luna .
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (08) :6037-6124
[7]   Improved Swin Transformer-Based Semantic Segmentation of Postearthquake Dense Buildings in Urban Areas Using Remote Sensing Images [J].
Cui, Liangyi ;
Jing, Xin ;
Wang, Yu ;
Huan, Yixuan ;
Xu, Yang ;
Zhang, Qiangqiang .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 :369-385
[8]  
Das R., 2019, arXiv
[9]   PSRT: Pyramid Shuffle-and-Reshuffle Transformer for Multispectral and Hyperspectral Image Fusion [J].
Deng, Shang-Qi ;
Deng, Liang-Jian ;
Wu, Xiao ;
Ran, Ran ;
Hong, Danfeng ;
Vivone, Gemine .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[10]   ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data [J].
Diakogiannis, Foivos, I ;
Waldner, Francois ;
Caccetta, Peter ;
Wu, Chen .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 162 :94-114