Context Enhancing Representation for Semantic Segmentation in Remote Sensing Images

被引:19
作者
Fang, Leyuan [1 ,2 ]
Zhou, Peng [1 ,3 ]
Liu, Xinxin [1 ,3 ]
Ghamisi, Pedram [4 ,5 ]
Chen, Siwei [6 ]
机构
[1] Hunan Univ, Coll Elect & Informat Engn, Changsha 410082, Hunan, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518000, Peoples R China
[3] Hunan Univ, Key Lab Visual Percept & Artificial Intelligence, Changsha 410082, Hunan, Peoples R China
[4] Helmholtz Inst Freiberg Resource Technol, Helmholtz Zentrum Dresden Rossendorf HZDR, D-09599 Freiberg, Germany
[5] Inst Adv Res Artificial Intelligence IARAI, A-1030 Vienna, Austria
[6] Natl Univ Def Technol, State Key Lab Complex Electromagnet Environm Effe, Changsha 410073, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Context modeling; Image segmentation; Convolution; Decoding; Remote sensing; Predictive models; deep learning; feature alignment and enhancement; remote sensing images (RSIs); semantic segmentation;
D O I
10.1109/TNNLS.2022.3201820
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the foundation of image interpretation, semantic segmentation is an active topic in the field of remote sensing. Facing the complex combination of multiscale objects existing in remote sensing images (RSIs), the exploration and modeling of contextual information have become the key to accurately identifying the objects at different scales. Although several methods have been proposed in the past decade, insufficient context modeling of global or local information, which easily results in the fragmentation of large-scale objects, the ignorance of small-scale objects, and blurred boundaries. To address the above issues, we propose a contextual representation enhancement network (CRENet) to strengthen the global context (GC) and local context (LC) modeling in high-level features. The core components of the CRENet are the local feature alignment enhancement module (LFAEM) and the superpixel affinity loss (SAL). The LFAEM aligns and enhances the LC in low-level features by constructing contextual contrast through multilayer cascaded deformable convolution and is then supplemented with high-level features to refine the segmentation map. The SAL assists the network to accurately capture the GC by supervising semantic information and relationship learned from superpixels. The proposed method is plug-and-play and can be embedded in any FCN-based network. Experiments on two popular RSI datasets demonstrate the effectiveness of our proposed network with competitive performance in qualitative and quantitative aspects.
引用
收藏
页码:4138 / 4152
页数:15
相关论文
共 57 条
[1]   SLIC Superpixels Compared to State-of-the-Art Superpixel Methods [J].
Achanta, Radhakrishna ;
Shaji, Appu ;
Smith, Kevin ;
Lucchi, Aurelien ;
Fua, Pascal ;
Suesstrunk, Sabine .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) :2274-2281
[2]   Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks [J].
Audebert, Nicolas ;
Le Saux, Bertrand ;
Lefevre, Sebastien .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 140 :20-32
[3]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[4]  
Bertasius Gedas, 2019, ADV NEUR IN, V32
[6]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[7]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[8]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[9]   Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation [J].
Ding, Henghui ;
Jiang, Xudong ;
Shuai, Bing ;
Liu, Ai Qun ;
Wang, Gang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2393-2402
[10]   LANet: Local Attention Embedding to Improve the Semantic Segmentation of Remote Sensing Images [J].
Ding, Lei ;
Tang, Hao ;
Bruzzone, Lorenzo .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (01) :426-435