GAF-Net: Improving the Performance of Remote Sensing Image Fusion using Novel Global Self and Cross Attention Learning

被引:26
作者
Jha, Ankit [1 ]
Bose, Shirsha [2 ]
Banerjee, Biplab [1 ]
机构
[1] Indian Inst Technol, Bombay, Maharashtra, India
[2] Tech Univ Munich, Munich, Germany
来源
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年
关键词
LAND-COVER CLASSIFICATION;
D O I
10.1109/WACV56688.2023.00629
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The notion of self and cross-attention learning has been found to substantially boost the performance of remote sensing (RS) image fusion. However, while the self-attention models fail to incorporate the global context due to the limited size of the receptive fields, cross-attention learning may generate ambiguous features as the feature extractors for all the modalities are jointly trained. This results in the generation of redundant multi-modal features, thus limiting the fusion performance. To address these issues, we propose a novel fusion architecture called Global Attention based Fusion Network (GAF-Net), equipped with novel self and cross-attention learning techniques. We introduce the within-modality feature refinement module through global spectral-spatial attention learning using the query-key-value processing where both the global spatial and channel contexts are used to generate two channel attention masks. Since it is non-trivial to generate the cross-attention from within the fusion network, we propose to leverage two auxiliary tasks of modality-specific classification to produce highly discriminative cross-attention masks. Finally, to ensure non-redundancy, we propose to penalize the high correlation between attended modality-specific features. Our extensive experiments on five benchmark datasets, including optical, multispectral (MS), hyperspectral (HSI), light detection and ranging (LiDAR), synthetic aperture radar (SAR), and audio modalities establish the superiority of GAF-Net concerning the literature.
引用
收藏
页码:6343 / 6352
页数:10
相关论文
共 54 条
[1]  
[Anonymous], 2013, Indoor semantic segmentation using depth information
[2]   Hyperspectral remote sensing for mineral exploration in Pulang, Yunnan Province, China [J].
Bishop, Charlotte A. ;
Liu, Jian Guo ;
Mason, Philippa J. .
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2011, 32 (09) :2409-2426
[3]   Kernel-based framework for multitemporal and multisource remote sensing data classification and change detection [J].
Camps-Valls, Gustavo ;
Gomez-Chova, Luis ;
Munoz-Mari, Jordi ;
Rojo-Alvarez, Jose Luis ;
Martinez-Ramon, Manel .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2008, 46 (06) :1822-1835
[4]  
Chen Chun-Fu, 2021, ABS210314899 CORR
[5]   Case Study of a Retrieval Method of 3D Proxy Reflectivity from FY-4A Lightning Data and Its Impact on the Assimilation and Forecasting for Severe Rainfall Storms [J].
Chen, Yaodeng ;
Yu, Zheng ;
Han, Wei ;
He, Jing ;
Chen, Min .
REMOTE SENSING, 2020, 12 (07)
[6]   Deep Cross-Modal ImageVoice Retrieval in Remote Sensing [J].
Chen, Yaxiong ;
Lu, Xiaoqiang ;
Wang, Shuai .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (10) :7049-7061
[7]  
Cordonnier Jean-Baptiste, 2020, ABS200616362 CORR
[8]  
Demir Ilke, 2018, ABS180506561 CORR
[9]  
Dosovitskiy A., 2021, P 9 INT C LEARN REPR
[10]  
Gao Jianhao, 2021, REMOTE SENSING, V13