More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification

被引:1051
作者
Hong, Danfeng [1 ]
Gao, Lianru [2 ]
Yokoya, Naoto [3 ,4 ]
Yao, Jing [5 ]
Chanussot, Jocelyn [6 ,7 ]
Du, Qian [8 ]
Zhang, Bing [2 ,9 ]
机构
[1] Univ Grenoble Alpes, CNRS, Grenoble INP, GIPSA Lab, F-38000 Grenoble, France
[2] Chinese Acad Sci, Aerosp Informat Res Inst, Key Lab Digital Earth Sci, Beijing 100094, Peoples R China
[3] Univ Tokyo, Grad Sch Frontier Sci, Chiba 2778561, Japan
[4] RIKEN, RIKEN Ctr Adv Intelligence Project AIP, Geoinformat Unit, Tokyo 1030027, Japan
[5] Xi An Jiao Tong Univ, Sch Math & Stat, Xian 710049, Peoples R China
[6] Univ Grenoble Alpes, INRIA, CNRS, Grenoble INP,LJK, F-38000 Grenoble, France
[7] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
[8] Mississippi State Univ, Dept Elect & Comp Engn, Starkville, MS 39762 USA
[9] Univ Chinese Acad Sci, Coll Resources & Environm, Beijing 100049, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2021年 / 59卷 / 05期
基金
中国国家自然科学基金; 日本学术振兴会;
关键词
Classification; convolutional neural networks (CNNS); cross modality; deep learning (DL); feature learning; fusion; hyperspectral; light detection and ranging (LiDAR); multimodal; multispectral; network architecture; remote sensing (RS); synthetic aperture radar (SAR); CONVOLUTIONAL NEURAL-NETWORK; LAND-COVER; DATA FUSION; LIDAR DATA; MANIFOLD ALIGNMENT; FRAMEWORK;
D O I
10.1109/TGRS.2020.3016820
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Classification and identification of the materials lying over or beneath the earth's surface have long been a fundamental but challenging research topic in geoscience and remote sensing (RS), and have garnered a growing concern owing to the recent advancements of deep learning techniques. Although deep networks have been successfully applied in single-modality-dominated classification tasks, yet their performance inevitably meets the bottleneck in complex scenes that need to be finely classified, due to the limitation of information diversity. In this work, we provide a baseline solution to the aforementioned difficulty by developing a general multimodal deep learning (MDL) framework. In particular, we also investigate a special case of multi-modality learning (MML)-cross-modality learning (CML) that exists widely in RS image classification applications. By focusing on "what," "where," and "how" to fuse, we show different fusion strategies as well as how to train deep networks and build the network architecture. Specifically, five fusion architectures are introduced and developed, further being unified in our MDL framework. More significantly, our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs). To validate the effectiveness and superiority of the MDL framework, extensive experiments related to the settings of MML and CML are conducted on two different multimodal RS data sets. Furthermore, the codes and data sets will be available at https://github.com/danfenghong/IEEE_TGRS_MDLRS, contributing to the RS community.
引用
收藏
页码:4340 / 4354
页数:15
相关论文
共 55 条
[1]  
[Anonymous], 2011, P 28 INT C MACH LEAR
[2]   Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks [J].
Audebert, Nicolas ;
Le Saux, Bertrand ;
Lefevre, Sebastien .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 140 :20-32
[3]   Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks [J].
Audebert, Nicolas ;
Le Saux, Bertrand ;
Lefevre, Sebastien .
COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 :180-196
[4]   M3Fusion: A Deep Learning Architecture for Multiscale Multimodal Multitemporal Satellite Data Fusion [J].
Benedetti, Paola ;
Ienco, Dino ;
Gaetano, Raffaele ;
Ose, Kenji ;
Pensa, Ruggero G. ;
Dupuy, Stephane .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2018, 11 (12) :4939-4949
[5]   CodeSLAM-Learning a Compact, Optimisable Representation for Dense Visual SLAM [J].
Bloesch, Michael ;
Czarnowski, Jan ;
Clark, Ronald ;
Leutenegger, Stefan ;
Davison, Andrew J. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2560-2568
[6]   An Enhanced 3-D Discrete Wavelet Transform for Hyperspectral Image Classification [J].
Cao, Xiangyong ;
Yao, Jing ;
Fu, Xueyang ;
Bi, Haixia ;
Hong, Danfeng .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (06) :1104-1108
[7]   Hyperspectral Image Classification With Convolutional Neural Network and Active Learning [J].
Cao, Xiangyong ;
Yao, Jing ;
Xu, Zongben ;
Meng, Deyu .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (07) :4604-4616
[8]   Classification of remote sensing images from urban areas using a fuzzy possibilistic model [J].
Chanussot, J ;
Benediktsson, JA ;
Fauvel, M .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2006, 3 (01) :40-44
[9]   Multi-source remotely sensed data fusion for improving land cover classification [J].
Chen, Bin ;
Huang, Bo ;
Xu, Bing .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2017, 124 :27-39
[10]   Deep Fusion of Remote Sensing Data for Accurate Classification [J].
Chen, Yushi ;
Li, Chunyang ;
Ghamisi, Pedram ;
Jia, Xiuping ;
Gu, Yanfeng .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (08) :1253-1257