Deformable Transformer and Spectral U-Net for Large-Scale Hyperspectral Image Semantic Segmentation

被引:1
作者
Zhang, Tianjian [1 ]
Xue, Zhaohui [2 ]
Su, Hongjun [2 ]
机构
[1] Hohai Univ, Sch Earth Sci & Engn, Nanjing 211100, Peoples R China
[2] Hohai Univ, Coll Geog & Remote Sensing, Nanjing 211100, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Data mining; Hyperspectral imaging; Semantics; Semantic segmentation; Convolution; Sensors; Land surface; Decoding; Deep learning; hyperspectral remote sensing; large-scale; semantic segmentation; transformer; NETWORK;
D O I
10.1109/JSTARS.2024.3485239
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Remote sensing semantic segmentation tasks aim to automatically extract land cover types by accurately classifying each pixel. However, large-scale hyperspectral remote sensing images possess rich spectral information, complex and diverse spatial distributions, significant scale variations, and a wide variety of land cover types with detailed features, which pose significant challenges for segmentation tasks. To overcome these challenges, this study introduces a U-shaped semantic segmentation network that combines global spectral attention and deformable Transformer for segmenting large-scale hyperspectral remote sensing images. First, convolution and global spectral attention are utilized to emphasize features with the richest spectral information, effectively extracting spectral characteristics. Second, deformable self-attention is employed to capture global-local information, addressing the complex scale and distribution of objects. Finally, deformable cross-attention is used to aggregate deep and shallow features, enabling comprehensive semantic information mining. Experiments conducted on a large-scale hyperspectral remote sensing dataset (WHU-OHS) demonstrate that: first, in different cities including Changchun, Shanghai, Guangzhou, and Karamay, DTSU-Net achieved the highest performance in terms of mIoU compared to the baseline methods, reaching 56.19%, 37.89%, 52.90%, and 63.54%, with an average improvement of 7.57% to 34.13%, respectively; second, module ablation experiments confirm the effectiveness of our proposed modules, and deformable Transformer significantly reduces training costs compared to conventional Transformers; third, our approach achieves the highest mIoU of 57.22% across the entire dataset, with a balanced trade-off between accuracy and parameter efficiency, demonstrating an improvement of 1.65% to 56.58% compared to the baseline methods.
引用
收藏
页码:20227 / 20244
页数:18
相关论文
共 61 条
[1]   Hyperspectral Image Classification Based on Multibranch Attention Transformer Networks [J].
Bai, Jing ;
Wen, Zheng ;
Xiao, Zhu ;
Ye, Fawang ;
Zhu, Yongdong ;
Alazab, Mamoun ;
Jiao, Licheng .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[2]   MsanlfNet: Semantic Segmentation Network With Multiscale Attention and Nonlocal Filters for High-Resolution Remote Sensing Images [J].
Bai, Lin ;
Lin, Xiangyuan ;
Ye, Zhen ;
Xue, Dongling ;
Yao, Cheng ;
Hui, Meng .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[3]   3-D Deep Learning Approach for Remote Sensing Image Classification [J].
Ben Hamida, Amina ;
Benoit, Alexandre ;
Lambert, Patrick ;
Ben Amar, Chokri .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (08) :4420-4434
[4]   SBSS: Stacking-Based Semantic Segmentation Framework for Very High-Resolution Remote Sensing Image [J].
Cai, Yuanzhi ;
Fan, Lei ;
Fang, Yuan .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[5]  
Chen J., 2021, arXiv
[6]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[7]   Adaptive Effective Receptive Field Convolution for Semantic Segmentation of VHR Remote Sensing Images [J].
Chen, Xi ;
Li, Zhiqiang ;
Jiang, Jie ;
Han, Zhen ;
Deng, Shiyi ;
Li, Zhihong ;
Fang, Tao ;
Huo, Hong ;
Li, Qingli ;
Liu, Min .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (04) :3532-3546
[8]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[9]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[10]   LANet: Local Attention Embedding to Improve the Semantic Segmentation of Remote Sensing Images [J].
Ding, Lei ;
Tang, Hao ;
Bruzzone, Lorenzo .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (01) :426-435