Effective Fusion of Multi-Modal Remote Sensing Data in a Fully Convolutional Network for Semantic Labeling

被引:29
作者
Zhang, Wenkai [1 ,2 ]
Huang, Hai [3 ]
Schmitz, Matthias [3 ]
Sun, Xian [1 ]
Wang, Hongqi [1 ]
Mayer, Helmut [3 ]
机构
[1] Chinese Acad Sci, Inst Elect, Key Lab Spatial Informat Proc & Applicat Syst Tec, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
[3] Bundeswehr Univ Munich, Inst Appl Comp Sci, Werner Heisenberg Weg 39, D-85577 Neubiberg, Germany
基金
中国国家自然科学基金;
关键词
semantic labeling; Fully Convolutional Networks; multi-modal dataset; fusion nets;
D O I
10.3390/rs10010052
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
In recent years, Fully Convolutional Networks (FCN) have led to a great improvement of semantic labeling for various applications including multi-modal remote sensing data. Although different fusion strategies have been reported for multi-modal data, there is no in-depth study of the reasons of performance limits. For example, it is unclear, why an early fusion of multi-modal data in FCN does not lead to a satisfying result. In this paper, we investigate the contribution of individual layers inside FCN and propose an effective fusion strategy for the semantic labeling of color or infrared imagery together with elevation (e.g., Digital Surface Models). The sensitivity and contribution of layers concerning classes and multi-modal data are quantified by recall and descent rate of recall in a multi-resolution model. The contribution of different modalities to the pixel-wise prediction is analyzed explaining the reason of the poor performance caused by the plain concatenation of different modalities. Finally, based on the analysis an optimized scheme for the fusion of layers with image and elevation information into a single FCN model is derived. Experiments are performed on the ISPRS Vaihingen 2D Semantic Labeling dataset (infrared and RGB imagery as well as elevation) and the Potsdam dataset (RGB imagery and elevation). Comprehensive evaluations demonstrate the potential of the proposed approach.
引用
收藏
页数:14
相关论文
共 25 条
[1]  
[Anonymous], 151100561 ARXIV
[2]  
[Anonymous], P 22 ACM INT C MULT
[3]  
Audebert N., 2016, 160906846 ARXIV
[4]   On time-constant robust tuning of fractional order [proportional derivative] controllers [J].
Badri, Vahid ;
Tavazoei, Mohammad Saleh .
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (05) :1179-1186
[5]  
Chen L.-C., 2016, 160600915 ARXIV
[6]   Classification for High Resolution Remote Sensing Imagery Using a Fully Convolutional Network [J].
Fu, Gang ;
Liu, Changjun ;
Zhou, Rong ;
Sun, Tao ;
Zhang, Qijian .
REMOTE SENSING, 2017, 9 (05)
[7]  
Gerke M., P PHOT COMP VIS PCV
[8]   Learning Rich Features from RGB-D Images for Object Detection and Segmentation [J].
Gupta, Saurabh ;
Girshick, Ross ;
Arbelaez, Pablo ;
Malik, Jitendra .
COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :345-360
[9]  
Hazirbas C., P AS C COMP VIS ACCV, V2
[10]  
Kampffmeyer M., P IEEE C COMP VIS PA, P1