MAFF-HRNet: Multi-Attention Feature Fusion HRNet for Building Segmentation in Remote Sensing Images

被引:19
作者
Che, Zhihao [1 ]
Shen, Li [2 ]
Huo, Lianzhi [3 ]
Hu, Changmiao [3 ]
Wang, Yanping [1 ]
Lu, Yao [2 ]
Bi, Fukun [1 ]
机构
[1] North China Univ Technol, Sch Informat, Beijing 100144, Peoples R China
[2] Beijing Inst Remote Sensing, Beijing 100011, Peoples R China
[3] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
基金
中国国家自然科学基金;
关键词
remote sensing; building extraction; built-up extraction; semantic segmentation; CLASSIFICATION; NETWORK;
D O I
10.3390/rs15051382
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Built-up areas and buildings are two main targets in remote sensing research; consequently, automatic extraction of built-up areas and buildings has attracted extensive attention. This task is usually difficult because of boundary blur, object occlusion, and intra-class inconsistency. In this paper, we propose the multi-attention feature fusion HRNet, MAFF-HRNet, which can retain more detailed features to achieve accurate semantic segmentation. The design of a pyramidal feature attention (PFA) hierarchy enhances the multilevel semantic representation of the model. In addition, we develop a mixed convolutional attention (MCA) block, which increases the capture range of receptive fields and overcomes the problem of intra-class inconsistency. To alleviate interference due to occlusion, a multiscale attention feature aggregation (MAFA) block is also proposed to enhance the restoration of the final prediction map. Our approach was systematically tested on the WHU (Wuhan University) Building Dataset and the Massachusetts Buildings Dataset. Compared with other advanced semantic segmentation models, our model achieved the best IoU results of 91.69% and 68.32%, respectively. To further evaluate the application significance of the proposed model, we migrated a pretrained model based on the World-Cover Dataset training to the Gaofen 16 m dataset for testing. Quantitative and qualitative experiments show that our model can accurately segment buildings and built-up areas from remote sensing images.
引用
收藏
页数:19
相关论文
共 59 条
[1]   Multi-Object Segmentation in Complex Urban Scenes from High-Resolution Remote Sensing Data [J].
Abdollahi, Abolfazl ;
Pradhan, Biswajeet ;
Shukla, Nagesh ;
Chakraborty, Subrata ;
Alamri, Abdullah .
REMOTE SENSING, 2021, 13 (18)
[2]   Automatic urban building boundary extraction from high resolution aerial images using an innovative model of active contours [J].
Ahmadi, Salman ;
Zoej, M. J. Valadan ;
Ebadi, Hamid ;
Moghaddam, Hamid Abrishami ;
Mohammadzadeh, Ali .
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2010, 12 (03) :150-157
[3]   Remote Sensing Image Retrieval With Global Morphological Texture Descriptors [J].
Aptoula, Erchan .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2014, 52 (05) :3023-3034
[4]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[5]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[6]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[7]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[8]  
Chen X., 2021, IEEE T GEOSCI ELECT, V60, P1, DOI [10.1109/TGRS.2020.3034752, DOI 10.1109/TGRS.2020.3034752]
[9]   Multi-Scale Receptive Field Detection Network [J].
Cui, Haoren ;
Wei, Zhihua .
IEEE ACCESS, 2019, 7 :138825-138832
[10]  
Dosovitskiy A., 2021, An image is worth 16x16 words: Transformers for image recognition at scale