Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks

被引:309
作者
Hong, Danfeng [1 ]
Zhang, Bing [1 ,2 ]
Li, Hao [3 ]
Li, Yuxuan [1 ,4 ]
Yao, Jing [1 ]
Li, Chenyu [5 ]
Werner, Martin [3 ]
Chanussot, Jocelyn [6 ]
Zipf, Alexander [7 ]
Zhu, Xiao Xiang [8 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
[2] Univ Chinese Acad Sci, Coll Resources & Environm, Beijing 100049, Peoples R China
[3] Tech Univ Munich, Big Geospatial Data Management, D-85521 Munich, Germany
[4] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
[5] Southeast Univ, Sch Math, Nanjing 210096, Peoples R China
[6] Univ Grenoble Alpes, GIPSA Lab, CNRS, Grenoble INP, F-38000 Grenoble, France
[7] Heidelberg Univ, Inst Geog, D-69120 Heidelberg, Germany
[8] Tech Univ Munich, Data Sci Earth Observat, D-80333 Munich, Germany
基金
中国国家自然科学基金;
关键词
Cross-city; Deep learning; Dice loss; Domain adaptation; High-resolution network; Land cover; Multimodal benchmark datasets; Remote sensing; Segmentation; COVER;
D O I
10.1016/j.rse.2023.113856
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e.g., single cities or regions). Yet these AI models tend to meet the performance bottleneck in the case studies across cities or regions, due to the lack of diverse RS information and cutting-edge solutions with high generalization ability. To this end, we build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, mul-tispectral, SAR) for the study purpose of the cross-city semantic segmentation task (called C2Seg dataset), which consists of two cross-city scenes, i.e., Berlin-Augsburg (in Germany) and Beijing-Wuhan (in China). Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN for short, to promote the AI model's generalization ability from the multi-city environments. HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion but also closing the gap derived from enormous differences of RS image representations between different cities by means of adversarial learning. In addition, the Dice loss is considered in HighDAN to alleviate the class imbalance issue caused by factors across cities. Extensive experiments conducted on the C2Seg dataset show the superiority of our HighDAN in terms of segmentation performance and generalization ability, compared to state-of-the-art com-petitors. The C2Seg dataset and the semantic segmentation toolbox (involving the proposed HighDAN) will be available publicly at https://github.com/danfenghong/RSE_Cross-city.
引用
收藏
页数:17
相关论文
共 63 条
  • [1] Learning from multimodal and multitemporal earth observation data for building damage mapping
    Adriano, Bruno
    Yokoya, Naoto
    Xia, Junshi
    Miura, Hiroyuki
    Liu, Wen
    Matsuoka, Masashi
    Koshimura, Shunichi
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 175 : 132 - 143
  • [2] Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks
    Audebert, Nicolas
    Le Saux, Bertrand
    Lefevre, Sebastien
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 140 : 20 - 32
  • [3] Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks
    Audebert, Nicolas
    Le Saux, Bertrand
    Lefevre, Sebastien
    [J]. COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 : 180 - 196
  • [4] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [5] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [6] No More Discrimination: Cross City Adaptation of Road Scene Segmenters
    Chen, Yi-Hsin
    Chen, Wei-Yu
    Chen, Yu-Ting
    Tsai, Bo-Cheng
    Wang, Yu-Chiang Frank
    Sun, Min
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2011 - 2020
  • [7] Danielson JJ, 2011, U.S. Geological Survey Open-File Report 2011-1073, DOI DOI 10.3133/OFR20111073
  • [8] ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data
    Diakogiannis, Foivos, I
    Waldner, Francois
    Caccetta, Peter
    Wu, Chen
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 162 (162) : 94 - 114
  • [9] Mapping large-scale and fine-grained urban functional zones from VHR images using a multi-scale semantic segmentation network and object based approach
    Du, Shouhang
    Du, Shihong
    Liu, Bo
    Zhang, Xiuyuan
    [J]. REMOTE SENSING OF ENVIRONMENT, 2021, 261
  • [10] Feranec J., 2016, European landscape dynamics: CORINE Land Cover Data