Fast extraction of buildings from remote sensing images by fusion of CNN and Transformer

被引：2

作者：

Zhang Y. ^{[1
,2
]}

Guo W. ^{[1
]}

Wu C. ^{[1
]}

机构：

[1] School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang

[2] Hebei Key Laboratory of Electromagnetic Environmental Effects and Information Processing, Shijiazhuang Tiedao University, Shijiazhuang

来源：

Guangxue Jingmi Gongcheng/Optics and Precision Engineering | 2023年 / 31卷 / 11期

关键词：

building extraction; multi-scale convolution; remote sensing image; sparse feature extractor;

D O I：

10.37188/OPE.20233111.1700

中图分类号：

学科分类号：

摘要：

The efficient extraction of buildings from remote sensing images plays an important role in urban planning，disaster rescue，and military reconnaissance. Building extraction methods based on deep learning have made significant progress in accuracy，especially with the sparse token transformer network （STTNet）achieving extremely high accuracy. However，these methods are usually implemented using complex convolution operations in extremely large network models，which results in low extraction speed，thereby presenting difficulties in fulfilling practical needs. Therefore，in this study，a method is designed for the fast extraction of buildings from remote sensing images. First，multi-scale convolution is introduced into the feature extraction network of the STTNet model，whereby multi-scale features are extracted in the same convolution layer to further improve the feature extraction capability of the model. Second，channel attention is applied to the feature map of the force weights，to effectively learn channel attention weights，thereby solving the problem of floating channel attention weights when using the backbone network to output the learned feature map. Finally，to reduce the number of model parameters and speed up the model，the STTNet model structure is changed from parallel to series. Experiments on the INRIA building dataset show that in terms of accuracy and the intersection over union（IoU）metric，the proposed method is 18. 3% faster than STTNet and thus better than current mainstream methods. © 2023 Chinese Academy of Sciences. All rights reserved.

引用

页码：1700 / 1709

页数：9

共 18 条

[1] XU SH, GUO X Y，, Et al., Building segmentation in remote sensing image based on multiscale-feature fusion dilated convolution resnet［J］, Optics and Precision Engineering, 28, 7, pp. 1588-1599, (2020)
[2] WANG S Y, YANG D F，, Et al., High-order statistics integration method for automatic building extraction of remote sensing images［J］, Optics and Precision Engineering, 27, 11, pp. 2474-2483, (2019)
[3] ZHANG Z X, WANG Y H., JointNet：a common neural network for road and building extraction［J］, Remote Sensing, 11, 6, (2019)
[4] GAO L R，, Et al., Building extraction from high-resolution aerial imagery using a generative adversarial network with spatial and channel attention mechanisms［J］, Remote Sensing, 11, 8, (2019)
[5] Semantic segmentation using adversarial networks ［EB/OL］, (2016)
[6] ZHANG X Q, XIAO Z H，, LI D Y，, Et al., Semantic segmentation of remote sensing images using multiscale decoding network［J］, IEEE Geoscience and Remote Sensing Letters, 16, 9, pp. 1492-1496, (2019)
[7] LIU P H, LIU M X，, Et al., Building footprint extraction from high-resolution images via spatial residual inception convolutional neural network ［J］, Remote Sensing, 11, 7, pp. 830-848, (2019)
[8] PLAZA A., Hybrid first and second order attention Unet for building segmentation in remote sensing images［J］, Science China In⁃ formation Sciences, 63, 4, pp. 1-12, (2020)
[9] ZHENG S X, ZHAO H S，, Et al., Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers［C］, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）, pp. 6877-6886
[10] ZHANG Y T，, Et al., Memory-augmented transformer for remote sensing image semantic segmentation［J］, Remote Sensing, 13, 22, (2021)

← 1 2 →