Deformable Transformer and Spectral U-Net for Large-Scale Hyperspectral Image Semantic Segmentation

被引：1

作者：

Zhang, Tianjian ^{[1
]}

Xue, Zhaohui ^{[2
]}

Su, Hongjun ^{[2
]}

机构：

[1] Hohai Univ, Sch Earth Sci & Engn, Nanjing 211100, Peoples R China

[2] Hohai Univ, Coll Geog & Remote Sensing, Nanjing 211100, Peoples R China

来源：

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING | 2024年 / 17卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Transformers; Data mining; Hyperspectral imaging; Semantics; Semantic segmentation; Convolution; Sensors; Land surface; Decoding; Deep learning; hyperspectral remote sensing; large-scale; semantic segmentation; transformer; NETWORK;

D O I：

10.1109/JSTARS.2024.3485239

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Remote sensing semantic segmentation tasks aim to automatically extract land cover types by accurately classifying each pixel. However, large-scale hyperspectral remote sensing images possess rich spectral information, complex and diverse spatial distributions, significant scale variations, and a wide variety of land cover types with detailed features, which pose significant challenges for segmentation tasks. To overcome these challenges, this study introduces a U-shaped semantic segmentation network that combines global spectral attention and deformable Transformer for segmenting large-scale hyperspectral remote sensing images. First, convolution and global spectral attention are utilized to emphasize features with the richest spectral information, effectively extracting spectral characteristics. Second, deformable self-attention is employed to capture global-local information, addressing the complex scale and distribution of objects. Finally, deformable cross-attention is used to aggregate deep and shallow features, enabling comprehensive semantic information mining. Experiments conducted on a large-scale hyperspectral remote sensing dataset (WHU-OHS) demonstrate that: first, in different cities including Changchun, Shanghai, Guangzhou, and Karamay, DTSU-Net achieved the highest performance in terms of mIoU compared to the baseline methods, reaching 56.19%, 37.89%, 52.90%, and 63.54%, with an average improvement of 7.57% to 34.13%, respectively; second, module ablation experiments confirm the effectiveness of our proposed modules, and deformable Transformer significantly reduces training costs compared to conventional Transformers; third, our approach achieves the highest mIoU of 57.22% across the entire dataset, with a balanced trade-off between accuracy and parameter efficiency, demonstrating an improvement of 1.65% to 56.58% compared to the baseline methods.

引用

页码：20227 / 20244

页数：18

共 61 条

[1] Hyperspectral Image Classification Based on Multibranch Attention Transformer Networks [J].

Bai, Jing ;

Wen, Zheng ;

Xiao, Zhu ;

Ye, Fawang ;

Zhu, Yongdong ;

Alazab, Mamoun ;

Jiao, Licheng .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[2] MsanlfNet: Semantic Segmentation Network With Multiscale Attention and Nonlocal Filters for High-Resolution Remote Sensing Images [J].

Bai, Lin ;

Lin, Xiangyuan ;

Ye, Zhen ;

Xue, Dongling ;

Yao, Cheng ;

Hui, Meng .

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19

[3] 3-D Deep Learning Approach for Remote Sensing Image Classification [J].

Ben Hamida, Amina ;

Benoit, Alexandre ;

Lambert, Patrick ;

Ben Amar, Chokri .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (08) :4420-4434

[4] SBSS: Stacking-Based Semantic Segmentation Framework for Very High-Resolution Remote Sensing Image [J].

Cai, Yuanzhi ;

Fan, Lei ;

Fang, Yuan .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61

[5]

Chen J., 2021, arXiv

[6] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[7] Adaptive Effective Receptive Field Convolution for Semantic Segmentation of VHR Remote Sensing Images [J].

Chen, Xi ;

Li, Zhiqiang ;

Jiang, Jie ;

Han, Zhen ;

Deng, Shiyi ;

Li, Zhihong ;

Fang, Tao ;

Huo, Hong ;

Li, Qingli ;

Liu, Min .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (04) :3532-3546

[8] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[9] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[10] LANet: Local Attention Embedding to Improve the Semantic Segmentation of Remote Sensing Images [J].

Ding, Lei ;

Tang, Hao ;

Bruzzone, Lorenzo .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (01) :426-435

← 1 2 3 4 5 6 7 →