DSViT: Dynamically Scalable Vision Transformer for Remote Sensing Image Segmentation and Classification

被引:8
作者
Wang, Falin [1 ]
Ji, Jian [1 ]
Wang, Yuan [1 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Remote sensing; Feature extraction; Computational modeling; Convolutional neural networks; Convolution; Task analysis; CNN; classification; remote sensing image; semantic segmentation; transformer;
D O I
10.1109/JSTARS.2023.3285259
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The relationship between the foreground target and the background of remote sensing image is very complex. The vision task of remote sensing image faces the problems of complex targets and unbalanced categories. These problems make the modeling method have further improvement space. Therefore, this article proposes a dynamically scalable attention model that combines convolutional features and Transformer features. It can dynamically select the model depth according to the size of the input image, which alleviates the problem of insufficient global information extraction of the single convolution model and the computational overhead limitation of the pure Transformer model. We validated the model on two public remote sensing image classifications and two remote sensing image segmentation datasets. The accuracy and mean pixel accuracy (mPA) of the method in this article reached 96.16% and 93.44%, respectively, on the university of california (UC) Merced classification dataset. Compared with some recent work, the method has a net improvement of 5.0% and 4.82% over the pyramid vision transformer (PVT) model. On the Potsdam segmentation dataset, the accuracy and F1 of the transformer and CNN hybrid neural network (TCHNN) model are 91.5% and 92.86%, respectively. The performance of the method has improved 0.64% and 1.0%, and the other two datasets have also achieved the best results.
引用
收藏
页码:5441 / 5452
页数:12
相关论文
共 61 条
[31]   Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification [J].
Maggiori, Emmanuel ;
Tarabalka, Yuliya ;
Charpiat, Guillaume ;
Alliez, Pierre .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (02) :645-657
[32]  
Penatti Otavio A. B., 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), P44, DOI 10.1109/CVPRW.2015.7301382
[33]  
Rao YM, 2021, ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), V34
[34]   You Only Look Once: Unified, Real-Time Object Detection [J].
Redmon, Joseph ;
Divvala, Santosh ;
Girshick, Ross ;
Farhadi, Ali .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :779-788
[35]   U-Net: Convolutional Networks for Biomedical Image Segmentation [J].
Ronneberger, Olaf ;
Fischer, Philipp ;
Brox, Thomas .
MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 :234-241
[36]  
Rottensteiner F., International society for photogrammetry and remote sensing, 2d semantic labeling contest.
[37]   A patch-based convolutional neural network for remote sensing image classification [J].
Sharma, Atharva ;
Liu, Xiuwen ;
Yang, Xiaojun ;
Shi, Di .
NEURAL NETWORKS, 2017, 95 :19-28
[38]  
Simonyan K, 2015, Arxiv, DOI arXiv:1409.1556
[39]   BIBED-Seg: Block-in-Block Edge Detection Network for Guiding Semantic Segmentation Task of High-Resolution Remote Sensing Images [J].
Sui, Baikai ;
Cao, Yungang ;
Bai, Xueqin ;
Zhang, Shuang ;
Wu, Renzhe .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 :1531-1549
[40]  
Sutskever I., 2014, Sequence to Sequence Learn, P1