MHIAIFormer: Multihead Interacted and Adaptive Integrated Transformer With Spatial-Spectral Attention for Hyperspectral Image Classification

被引:1
作者
Kong, Delong [1 ]
Zhang, Jiahua [1 ,2 ]
Zhang, Shichao [1 ]
Yu, Xiang [1 ]
Prodhan, Foyez Ahmed [3 ]
机构
[1] Qingdao Univ, Coll Comp Sci & Technol, Remote Sensing Informat & Digital Earth Ctr, Qingdao 266071, Peoples R China
[2] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
[3] Bangabandhu Sheikh Mujibur Rahman Agr Univ, Dept Agr Extens & Rural Dev, Gazipur 1706, Bangladesh
关键词
Transformers; Feature extraction; Head; Data mining; Attention mechanisms; Adaptation models; Task analysis; Deep learning (DL); hyperspectral image (HSI) classification; multihead interacted and adaptive integrated transformer (MHIAIFormer); multihead self-attention (MHSA); FRAMEWORK; NETWORK;
D O I
10.1109/JSTARS.2024.3441111
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep learning is an effective method for hyperspectral image (HSI) classification, where CNN-based and Transformer-based methods have achieved excellent performance. However, there are some drawbacks to the existing CNN-based and Transformer-based HSI classification approaches: 1) CNN-based methods are deficient in showing the extraction of multiscale features and localized features owing to the fixed-size input patch. 2) the MHSA module ignores the interaction capability between multiple attention heads, which leads to insufficient feature fusion in various directions. 3) The weights of attention heads in various directions are disregarded in the MHSA and attention heads are simply concatenated horizontally. To address the above-mentioned limitations, a novel multihead interacted and adaptive integrated transformer (MHIAIFormer) with spatial-spectral attention, which integrates the respective advantages of convolutions and transformers is proposed in this study. A pyramidal spatial-spectral attention (PS2A) feature extraction module is adopted to efficiently capture the localized and multiscale feature information of HSI. The output of PS2A is then sent to the transformer encoder stage through a grouped multiscale cross-dimension embedding module, which includes additive self-attention using multihead interaction and MHSA with adaptive multihead merging to capture the long-range dependencies of the features. Extensive experiments on four datasets verify that our proposed approach achieves more satisfactory classification accuracy when compared with state-of-the-art models. The overall accuracy of the proposed model achieved 95.97%, 98.68%, 92.68%, and 99.49% on four datasets.
引用
收藏
页码:14486 / 14501
页数:16
相关论文
共 61 条
[1]  
Ahmad M, 2024, Arxiv, DOI arXiv:2404.14945
[2]   Ground-based hyperspectral analysis of the urban nightscape [J].
Alamus, Ramon ;
Bara, Salvador ;
Corbera, Jordi ;
Escofet, Jaume ;
Pala, Vicenc ;
Pipia, Luca ;
Tarda, Anna .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2017, 124 :16-26
[3]   Hierarchical Attention Transformer for Hyperspectral Image Classification [J].
Arshad, Tahir ;
Zhang, Junping .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 :1-5
[4]   Hyperspectral Image Classification Based on Multibranch Attention Transformer Networks [J].
Bai, Jing ;
Wen, Zheng ;
Xiao, Zhu ;
Ye, Fawang ;
Zhu, Yongdong ;
Alazab, Mamoun ;
Jiao, Licheng .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[5]   3-D Deep Learning Approach for Remote Sensing Image Classification [J].
Ben Hamida, Amina ;
Benoit, Alexandre ;
Lambert, Patrick ;
Ben Amar, Chokri .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (08) :4420-4434
[6]   A novel transductive SVM for semisupervised classification of remote-sensing images [J].
Bruzzone, Lorenzo ;
Chi, Mingmin ;
Marconcini, Mattia .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2006, 44 (11) :3363-3373
[7]   Integration of 3-dimensional discrete wavelet transform and Markov random field for hyperspectral image classification [J].
Cao, Xiangyong ;
Xu, Lin ;
Meng, Deyu ;
Zhao, Qian ;
Xu, Zongben .
NEUROCOMPUTING, 2017, 226 :90-100
[8]   Mobile-Former: Bridging MobileNet and Transformer [J].
Chen, Yinpeng ;
Dai, Xiyang ;
Chen, Dongdong ;
Liu, Mengchen ;
Dong, Xiaoyi ;
Yuan, Lu ;
Liu, Zicheng .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5260-5269
[9]   UMiT-Net: A U-Shaped Mix-Transformer Network for Extracting Precise Roads Using Remote Sensing Images [J].
Deng, Fei ;
Luo, Wen ;
Ni, Yudong ;
Wang, Xuben ;
Wang, Yan ;
Zhang, Gulan .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[10]  
Dosovitskiy A, 2021, INT C LEARN REPR ICL