Deep Symmetric Fusion Transformer for Multimodal Remote Sensing Data Classification

被引:3
作者
Chang, Honghao [1 ]
Bi, Haixia [1 ]
Li, Fan [1 ]
Xu, Chen [2 ,3 ]
Chanussot, Jocelyn [4 ]
Hong, Danfeng [5 ,6 ]
机构
[1] Xi An Jiao Tong Univ, Sch Informat & Commun Engn, Xian 710049, Peoples R China
[2] Peng Cheng Lab, Dept Math & Fundamental Res, Shenzhen 518055, Peoples R China
[3] Xi An Jiao Tong Univ, Sch Math & Stat, Xian 710049, Peoples R China
[4] Univ Grenoble Alpes, CNRS, INRIA, Grenoble INP LJK, F-38000 Grenoble, France
[5] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100049, Peoples R China
[6] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
关键词
Land-cover classification; local-global mixture (LGM); multimodal feature fusion; remote sensing; symmetric fusion transformer (SFT); LAND-COVER CLASSIFICATION; LIDAR DATA;
D O I
10.1109/TGRS.2024.3476975
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
In recent years, multimodal remote sensing data classification (MMRSC) has evoked growing attention due to its more comprehensive and accurate delineation of Earth's surface compared to its single-modal counterpart. However, it remains challenging to capture and integrate local and global features from single-modal data. Moreover, how to fully excavate and exploit the interactions between different modalities is still an intricate issue. To this end, we propose a novel dual-branch transformer-based framework named deep symmetric fusion transformer (DSymFuser). Within the framework, each branch contains a stack of local-global mixture (LGM) blocks, to extract hierarchical and discriminative single-modal features. In each LGM block, a local-global feature mixer with learnable weights is specifically devised to adaptively aggregate the local and global features extracted with a convolutional neural network (CNN)-transformer network. Furthermore, we innovatively design a symmetric fusion transformer (SFT) that trails behind each LGM block. The elaborately designed SFT symmetrically facilitates cross-modal correlation excavation, comprehensively exploiting the complementary cues underlying heterogeneous modalities. The hierarchical construction of the LGM and SFT blocks enables feature extraction and fusion in a multilevel manner, further promoting the completeness and descriptiveness of the learned features. We conducted extensive ablation studies and comparative experiments on three benchmark datasets, and the experimental results validated the effectiveness and superiority of the proposed method. The source code of the proposed method will be available publicly at https://github.com/HaixiaBi1982/DSymFuser.
引用
收藏
页数:15
相关论文
共 58 条
  • [1] Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks
    Audebert, Nicolas
    Le Saux, Bertrand
    Lefevre, Sebastien
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 140 : 20 - 32
  • [2] A Framework for Evaluating Land Use and Land Cover Classification Using Convolutional Neural Networks
    Carranza-Garcia, Manuel
    Garcia-Gutierrez, Jorge
    Riquelme, Jose C.
    [J]. REMOTE SENSING, 2019, 11 (03)
  • [3] Transformer Tracking
    Chen, Xin
    Yan, Bin
    Zhu, Jiawen
    Wang, Dong
    Yang, Xiaoyun
    Lu, Huchuan
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8122 - 8131
  • [4] Deep Fusion of Remote Sensing Data for Accurate Classification
    Chen, Yushi
    Li, Chunyang
    Ghamisi, Pedram
    Jia, Xiuping
    Gu, Yanfeng
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (08) : 1253 - 1257
  • [5] Xception: Deep Learning with Depthwise Separable Convolutions
    Chollet, Francois
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
  • [6] Thermal infrared remote sensing of urban heat: Hotspots, vegetation, and an assessment of techniques for use in urban planning
    Coutts, Andrew M.
    Harris, Richard J.
    Thu Phan
    Livesley, Stephen J.
    Williams, Nicholas S. G.
    Tapper, Nigel J.
    [J]. REMOTE SENSING OF ENVIRONMENT, 2016, 186 : 637 - 651
  • [7] Dabbiru L, 2015, INT GEOSCI REMOTE SE, P1901, DOI 10.1109/IGARSS.2015.7326165
  • [8] Hyperspectral and LiDAR Data Fusion: Outcome of the 2013 GRSS Data Fusion Contest
    Debes, Christian
    Merentitis, Andreas
    Heremans, Roel
    Hahn, Juergen
    Frangiadakis, Nikolaos
    van Kasteren, Tim
    Liao, Wenzhi
    Bellens, Rik
    Pizurica, Aleksandra
    Gautama, Sidharta
    Philips, Wilfried
    Prasad, Saurabh
    Du, Qian
    Pacifici, Fabio
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2014, 7 (06) : 2405 - 2418
  • [9] RustQNet: Multimodal deep learning for quantitative inversion of wheat stripe rust disease index
    Deng, Jie
    Hong, Danfeng
    Li, Chenyu
    Yao, Jing
    Yang, Ziqian
    Zhang, Zhijian
    Chanussot, Jocelyn
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 225
  • [10] Global-Local Transformer Network for HSI and LiDAR Data Joint Classification
    Ding, Kexing
    Lu, Ting
    Fu, Wei
    Li, Shutao
    Ma, Fuyan
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60