BS2T: Bottleneck Spatial-Spectral Transformer for Hyperspectral Image Classification

被引：67

作者：

Song, Ruoxi ^{[1
,2
]}

Feng, Yining ^{[1
]}

Cheng, Wei ^{[3
]}

Mu, Zhenhua ^{[1
]}

Wang, Xianghai ^{[1
,3
]}

机构：

[1] Liaoning Normal Univ, Sch Geog, Dalian 116029, Peoples R China

[2] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100101, Peoples R China

[3] Liaoning Normal Univ, Sch Comp & Informat Technol, Dalian 116029, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2022年 / 60卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Image classification; Transformers; Convolutional neural networks; Three-dimensional displays; Neural networks; Task analysis; 3-D convolutional neural network (CNN); bottleneck spatial-spectral transformer (BS2T); dual-branch neural network; hyperspectral (HS) image classification; multihead self-attention (MHSA); positional encoding; REPRESENTATION; NETWORK; CNN;

D O I：

10.1109/TGRS.2022.3185640

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Convolutional neural networks (CNNs) have been extensively applied to hyperspectral (HS) image classification tasks and achieved promising performance. However, for CNN-based HS image classification methods, it is hard to depict the dependencies among HS image pixels in long-range distanced positions and bands. Moreover, the limited receptive field of the convolutional layers extremely hinders the development of the CNN structure. To tackle these problems, in this article, the novel bottleneck spatial-spectral transformer (BS2T) is proposed to depict the long-range global dependencies of HS image pixels, which can be regarded as a feature extraction module for HS image classification networks. More specifically, inspired by bottleneck transformer in computer vision, for HS image feature extraction, the proposed BS2T is incorporated with a feature contraction module, a multihead spatial-spectral self-attention (MHS2A) module, and a feature expansion module. In this way, convolutional operations are replaced by the MHS2A to capture the long-range dependency of HS pixels regardless of their spatial position and distance. Meanwhile, in the MHS2A module, to highlight the spectral features of HS images, we introduce the spectral information and content spatial positional information to classical multihead self-attention to make the attention more positional aware and spectral aware. On this basis, a dual-branch HS image classification framework based on 3-D CNN and BS2T is defined for jointly extracting the local-global features of HS images. Experimental results on three public HS image classification datasets show that the proposed classification framework achieves a significant improvement when compared with the state-of-the-art methods. The source code of the proposed framework can be downloaded from https://github.com/srxlnnu/BS2T.

引用

页数：17

共 57 条

[1] ELM-based spectral-spatial classification of hyperspectral images using extended morphological profiles and composite feature mappings [J].

Argueello, Francisco ;

Heras, Dora B. .

INTERNATIONAL JOURNAL OF REMOTE SENSING, 2015, 36 (02) :645-664

[2] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[3]

Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]

[4]

Dozat T., 2016, P WORKSH GIV INT C L

[5] New Frontiers in Spectral-Spatial Hyperspectral Image Classification The latest advances based on mathematical morphology, Markov random fields, segmentation, sparse representation, and deep learning [J].

Ghamisi, Pedram ;

Maggiori, Emmanuel ;

Li, Shutao ;

Souza, Roberto ;

Tarabalka, Yuliya ;

Moser, Gabriele ;

De Giorgi, Andrea ;

Fang, Leyuan ;

Chen, Yushi ;

Chi, Mingmin ;

Serpico, Sebastiano B. ;

Benediktsson, Jon Atli .

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2018, 6 (03) :10-43

[6] Hyperspectral Image Classification With Attention-Aided CNNs [J].

Hang, Renlong ;

Li, Zhu ;

Liu, Qingshan ;

Ghamisi, Pedram ;

Bhattacharyya, Shuvra S. .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (03) :2281-2293

[7] HSI-BERT: Hyperspectral Image Classification Using the Bidirectional Encoder Representation From Transformers [J].

He, Ji ;

Zhao, Lina ;

Yang, Hongwei ;

Zhang, Mengmeng ;

Li, Wei .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (01) :165-178

[8] Spatial-Spectral Transformer for Hyperspectral Image Classification [J].

He, Xin ;

Chen, Yushi ;

Lin, Zhouhan .

REMOTE SENSING, 2021, 13 (03) :1-22

[9] Interpretable Hyperspectral Artificial Intelligence: When nonconvex modeling meets hyperspectral remote sensing [J].

Hong, Danfeng ;

He, Wei ;

Yokoya, Naoto ;

Yao, Jing ;

Gao, Lianru ;

Zhang, Liangpei ;

Chanussot, Jocelyn ;

Zhu, Xiaoxiang .

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2021, 9 (02) :52-87

[10] Deep EncoderDecoder Networks for Classification of Hyperspectral and LiDAR Data [J].

Hong, Danfeng ;

Gao, Lianru ;

Hang, Renlong ;

Zhang, Bing ;

Chanussot, Jocelyn .

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19

← 1 2 3 4 5 6 →