Multi-granularity vision transformer via semantic token for hyperspectral image classification

被引：12

作者：

Li, Bin ^{[1
]}

Ouyang, Er ^{[1
]}

Hu, Wenjing ^{[1
]}

Zhang, Guoyun ^{[1
]}

Zhao, Lin ^{[1
]}

Wu, Jianhui ^{[1
]}

机构：

[1] Hunan Inst Sci & Technol, Sch Informat Sci & Engn, Yueyang 414000, Peoples R China

来源：

INTERNATIONAL JOURNAL OF REMOTE SENSING | 2022年 / 43卷 / 17期

关键词：

Hyperspectral image classification; convolutional neural networks; transformer; word embedding; long-distance dependence;

D O I：

10.1080/01431161.2022.2142078

中图分类号：

TP7 [遥感技术];

学科分类号：

081102 ; 0816 ; 081602 ; 083002 ; 1404 ;

摘要：

The superior local context modelling capability of convolutional neural networks (CNNs) in representing features allows greatly enhanced performance in hyperspectral image (HSI) classification tasks by CNN-based methods. However, most of these methods suffer from a restricted receptive field and poor performance in the continuous data domain. To address these issues, we propose a multi-granularity vision transformer via semantic token (MSTViT) for HSI classification, which differs from the existing transformer view by modelling the HSI classification tasks as word embedding problems. Specifically, the MSTViT model extracts multi-level semantic features by a ladder feature extractor and applies a multi-granularity patch embedding module to embed these features simultaneously as different-scale tokens. Moreover, different-granularity tokens are fed to the vision transformer to capture the long-distance dependencies among the different tokens. A depth-wise separable convolution multi-layer perceptron is used to assist the attention mechanism for further excavation of the deep information of HSI. Finally, the performance of HSI classification is improved by fusing the coarse- and fine-granularity representations to generate stronger features. Experimental results on four standard datasets verify the marked improvement of the MSTVIT over state-of-the-art CNN and transformer structures. The code of this work is available at https://github.com/zhaolin6/MSTViT for the sake of reproducibility.

引用

页码：6538 / 6560

页数：23

共 32 条

[31] Spectral-Spatial Transformer Network for Hyperspectral Image Classification: A Factorized Architecture Search Framework [J].

Zhong, Zilong ;

Li, Ying ;

Ma, Lingfei ;

Li, Jonathan ;

Zheng, Wei-Shi .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[32] Deformable Convolutional Neural Networks for Hyperspectral Image Classification [J].

Zhu, Jian ;

Fang, Leyuan ;

Ghamisi, Pedram .

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2018, 15 (08) :1254-1258

← 1 2 3 4 →