MSTNet: A Multilevel Spectral-Spatial Transformer Network for Hyperspectral Image Classification

被引:92
作者
Yu, Haoyang [1 ]
Xu, Zhen [1 ]
Zheng, Ke [2 ]
Hong, Danfeng [3 ]
Yang, Hao [1 ]
Song, Meiping [1 ]
机构
[1] Dalian Maritime Univ, Ctr Hyperspectral Imaging Remote Sensing CHIRS, Informat Sci & Technol Coll, Dalian 116026, Peoples R China
[2] Liaocheng Univ, Coll Geog & Environm, Liaocheng 252059, Shandong, Peoples R China
[3] Chinese Acad Sci, Aerosp Informat Res Inst, Key Lab Computat Opt Imaging Technol, Beijing 100094, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2022年 / 60卷
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Transformers; Feature extraction; Convolutional neural networks; Hyperspectral imaging; Training; Data mining; Task analysis; Convolutional neural networks (CNNs); hyperspectral image (HSI); image-based classification; transformer; LEARNING APPROACH; KERNEL; SVM;
D O I
10.1109/TGRS.2022.3186400
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Convolutional neural networks (CNNs) have been widely used in hyperspectral image classification (HSIC). Although the current CNN-based methods have achieved good performance, they still face a series of challenges. For example, the receptive field is limited, information is lost in down-sampling layer, and a lot of computing resources are consumed for deep networks. To overcome these problems, we proposed a multilevel spectral-spatial transformer network (MSTNet) for HSIC. The structure of MSTNet is an image-based classification framework, which is efficient and straightforward. Based on this framework, we designed a self-attentive encoder. First, HSIs are processed into sequences. Meanwhile, a learned positional embedding (PE) is added to integrate spatial information. Then, a pure transformer encoder (TE) is employed to learn feature representations. Finally, the multilevel features are processed by decoders to generate the classification results in the original image size. The experimental results based on three real hyperspectral datasets demonstrate the efficiency of the proposed method in comparison with the other related CNN-based methods.
引用
收藏
页数:13
相关论文
共 47 条
[1]  
[Anonymous], INT C LEARNING REPRE
[2]   3-D Deep Learning Approach for Remote Sensing Image Classification [J].
Ben Hamida, Amina ;
Benoit, Alexandre ;
Lambert, Patrick ;
Ben Amar, Chokri .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (08) :4420-4434
[3]   PanCSC-Net: A Model-Driven Deep Unfolding Method for Pansharpening [J].
Cao, Xiangyong ;
Fu, Xueyang ;
Hong, Danfeng ;
Xu, Zongben ;
Meng, Deyu .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[4]   Deep Spatial-Spectral Global Reasoning Network for Hyperspectral Image Denoising [J].
Cao, Xiangyong ;
Fu, Xueyang ;
Xu, Chen ;
Meng, Deyu .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[5]   Hyperspectral Image Classification With Markov Random Fields and a Convolutional Neural Network [J].
Cao, Xiangyong ;
Zhou, Feng ;
Xu, Lin ;
Meng, Deyu ;
Xu, Zongben ;
Paisley, John .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (05) :2354-2367
[6]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[7]  
Chaurasia A, 2017, 2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)
[8]   Deep Learning-Based Classification of Hyperspectral Data [J].
Chen, Yushi ;
Lin, Zhouhan ;
Zhao, Xing ;
Wang, Gang ;
Gu, Yanfeng .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2014, 7 (06) :2094-2107
[9]  
Dosovitskiy A., 2020, P 9 INT C LEARN REPR
[10]   Wavelet SVM in Reproducing Kernel Hilbert Space for hyperspectral remote sensing image classification [J].
Du, Peijun ;
Tan, Kun ;
Xing, Xiaoshi .
OPTICS COMMUNICATIONS, 2010, 283 (24) :4978-4984