Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks

被引:369
作者
Hong, Danfeng [1 ]
Zhang, Bing [1 ,2 ]
Li, Hao [3 ]
Li, Yuxuan [1 ,4 ]
Yao, Jing [1 ]
Li, Chenyu [5 ]
Werner, Martin [3 ]
Chanussot, Jocelyn [6 ]
Zipf, Alexander [7 ]
Zhu, Xiao Xiang [8 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
[2] Univ Chinese Acad Sci, Coll Resources & Environm, Beijing 100049, Peoples R China
[3] Tech Univ Munich, Big Geospatial Data Management, D-85521 Munich, Germany
[4] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
[5] Southeast Univ, Sch Math, Nanjing 210096, Peoples R China
[6] Univ Grenoble Alpes, GIPSA Lab, CNRS, Grenoble INP, F-38000 Grenoble, France
[7] Heidelberg Univ, Inst Geog, D-69120 Heidelberg, Germany
[8] Tech Univ Munich, Data Sci Earth Observat, D-80333 Munich, Germany
基金
中国国家自然科学基金;
关键词
Cross-city; Deep learning; Dice loss; Domain adaptation; High-resolution network; Land cover; Multimodal benchmark datasets; Remote sensing; Segmentation; COVER;
D O I
10.1016/j.rse.2023.113856
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e.g., single cities or regions). Yet these AI models tend to meet the performance bottleneck in the case studies across cities or regions, due to the lack of diverse RS information and cutting-edge solutions with high generalization ability. To this end, we build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, mul-tispectral, SAR) for the study purpose of the cross-city semantic segmentation task (called C2Seg dataset), which consists of two cross-city scenes, i.e., Berlin-Augsburg (in Germany) and Beijing-Wuhan (in China). Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN for short, to promote the AI model's generalization ability from the multi-city environments. HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion but also closing the gap derived from enormous differences of RS image representations between different cities by means of adversarial learning. In addition, the Dice loss is considered in HighDAN to alleviate the class imbalance issue caused by factors across cities. Extensive experiments conducted on the C2Seg dataset show the superiority of our HighDAN in terms of segmentation performance and generalization ability, compared to state-of-the-art com-petitors. The C2Seg dataset and the semantic segmentation toolbox (involving the proposed HighDAN) will be available publicly at https://github.com/danfenghong/RSE_Cross-city.
引用
收藏
页数:17
相关论文
共 63 条
[1]   Learning from multimodal and multitemporal earth observation data for building damage mapping [J].
Adriano, Bruno ;
Yokoya, Naoto ;
Xia, Junshi ;
Miura, Hiroyuki ;
Liu, Wen ;
Matsuoka, Masashi ;
Koshimura, Shunichi .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 175 :132-143
[2]  
[Anonymous], 2019, Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation
[3]  
[Anonymous], 2014, PROC NEURIP, DOI [DOI 10.1145/3422622, 10.1145/3422622]
[4]   Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks [J].
Audebert, Nicolas ;
Le Saux, Bertrand ;
Lefevre, Sebastien .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 140 :20-32
[5]   Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks [J].
Audebert, Nicolas ;
Le Saux, Bertrand ;
Lefevre, Sebastien .
COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 :180-196
[6]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[7]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[8]   No More Discrimination: Cross City Adaptation of Road Scene Segmenters [J].
Chen, Yi-Hsin ;
Chen, Wei-Yu ;
Chen, Yu-Ting ;
Tsai, Bo-Cheng ;
Wang, Yu-Chiang Frank ;
Sun, Min .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2011-2020
[9]  
Danielson JJ, 2011, Global multi-resolution terrain elevation data 2010 (GMTED2010) (2011-1073)
[10]   ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data [J].
Diakogiannis, Foivos, I ;
Waldner, Francois ;
Caccetta, Peter ;
Wu, Chen .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 162 (162) :94-114