OCANet: An Overcomplete Convolutional Attention Network for Building Extraction From High-Resolution Remote Sensing Images

被引：0

作者：

Zhang, Bo ^{[1
,2
]}

Huang, Jiajia ^{[1
,2
]}

Wu, Fan ^{[3
,4
]}

Zhang, Wenjuan ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Airborne Remote Sensing Ctr, Aerosp Informat Res Inst, Beijing 100094, Peoples R China

[2] Univ Chinese Acad Sci, Coll Resources & Environm, Beijing 100049, Peoples R China

[3] Chinese Acad Sci, Aerosp Informat Res Inst, Int Res Ctr Big Data Sustainable Dev Goals, Beijing 100094, Peoples R China

[4] Hainan Aerosp Informat Res Inst, Key Lab Earth Observ Hainan Prov, Sanya 572000, Peoples R China

来源：

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING | 2024年 / 17卷

基金：

海南省自然科学基金;

关键词：

Building extraction; convolutional attention; overcomplete network; semantic segmentation;

D O I：

10.1109/JSTARS.2024.3471804

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Building extraction from remote sensing (RS) image holds a crucial position in the fields of urban planning and sustainable development. In high-resolution (HR) RS images, the characteristics of buildings, including their shapes, structures, and textures, become increasingly complex. This complexity poses considerable challenges to the prediction and recognition of small, dense, and complex-shaped buildings. To address these problems, we present a novel overcomplete convolutional attention network (OCANet) to enhance the accuracy of building extraction from HR RS image. Specifically, the proposed method adopts a multiscale convolutional attention encoder to focus on the two-dimensional structure of the building image while enhancing computational efficiency. Additionally, an overcomplete fusion branch module is introduced to control the network's deep receptive field size, enabling a more concentrated focus on smaller and denser buildings. Furthermore, an edge refinement fusion module is proposed to further enhance the network's capability to extract building edge details by integrating shallow feature information from different scales with deep semantic information. The efficacy of the individually designed modules is validated through ablation studies on public datasets, including the WHU aerial building dataset and the Massachusetts building dataset. Additionally, a dataset leveraging Gaofen-2 imagery, featuring a variety of building types, is introduced to benchmark against other state-of-the-art networks. Both qualitative and quantitative evaluations demonstrate the ability of OCANet to extract dense, small, and complex-shaped buildings in complex urban landscapes. The proposed method provides excellent performance compared to other networks while reducing computational overhead.

引用

页码：18427 / 18443

页数：17

共 33 条

[1] Ofori-Asenso R., Zomer E., Curtis A.J., Zoungas S., Gambhir M., Measures of population aging in Australia from 1950 to 2050, J. Popul. Ageing, 11, 9968, pp. 1-19, (2017)
[2] Hosseinpour H., Samadzadegan F., Javan F.D., CMGFNet: A deep cross-modal gated fusion network for building extraction from very high-resolution remote sensing images, Isprs J. Photogrammetry Remote Sens., 184, pp. 96-115, (2022)
[3] Li Y.S., Li X.W., Zhang Y.J., Peng D.F., Bruzzone L., Cost-efficient information extraction from massive remote sensing data: When weakly supervised deep learning meets remote sensing big data, Int. J. Appl. Earth Obs.Geoinf., 120, (2023)
[4] He D., Shi Q., Liu X.P., Zhong Y.F., Zhang L.P., Generating 2m fine-scale urban tree cover product over 34 metropolises in China based on deep context-Aware sub-pixel mapping network, Int. J. Appl. Earth Obs. Geoinf., 106, (2022)
[5] Zhao H., Shi J., Qi X., Wang X., Jia J., Pyramid scene parsing network, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR, pp. 6230-6239, (2017)
[6] Chen L.C.E., Zhu Y.K., Papandreou G., Schroff F., Adam H., Encoder-Decoder with atrous separable convolution for semantic image Segmentation, Lect. Notes Comput. Sci., 11211, pp. 833-851, (2018)
[7] Sun K., Xiao B., Liu D., Wang J., Deep high-resolution representation learning for human pose estimation, Proc. Cvpr IEEE, pp. 5686-5696, (2019)
[8] Huang L., Yuan Y., Guo J., Zhang C., Chen X., Wang J., Interlaced Sparse Self-Attention for Semantic Segmentation, (2019)
[9] Tong X.Y., Sun S.L., Fu M.X., Disentangled non-local neural networks, J. Electron Imaging, 30, 5, (2021)
[10] Liu Z., Lin Y.T., Cao Y., Hu H., Wei Y.X., Zhang Z., Lin S., Guo B.N., Swin transformer: Hierarchical vision transformer using shifted windows, 2021 IEEE/CVF International Conference on Computer Vision (ICCV 2021, pp. 9992-10002, (2021)

← 1 2 3 4 →