Building Type Classification Using CNN-Transformer Cross-Encoder Adaptive Learning From Very High Resolution Satellite Images

被引:0
作者
Zhang, Shaofeng [1 ]
Li, Mengmeng [1 ]
Zhao, Wufan [2 ]
Wang, Xiaoqin [1 ]
Wu, Qunyong [1 ]
机构
[1] Fuzhou Univ, Acad Digital China, Minist Educ, Key Lab Spatial Data Min & Informat Sharing, Fuzhou 350108, Peoples R China
[2] Hong Kong Univ Sci & Technol Guangzhou, Urban Governance & Design Thrust, Soc Hub, Guangzhou 511453, Peoples R China
基金
中国国家自然科学基金;
关键词
Buildings; Feature extraction; Transformers; Accuracy; Remote sensing; Semantics; Visualization; Architecture; Optimization; Earth; Building type classification; CNN-transformer networks; cross-encoder; feature interaction; very high resolution remote sensing;
D O I
10.1109/JSTARS.2024.3501678
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Building type information indicates the functional properties of buildings and plays a crucial role in smart city development and urban socioeconomic activities. Existing methods for classifying building types often face challenges in accurately distinguishing buildings between types while maintaining well-delineated boundaries, especially in complex urban environments. This study introduces a novel framework, i.e., CNN-Transformer cross-attention feature fusion network (CTCFNet), for building type classification from very high resolution remote sensing images. CTCFNet integrates convolutional neural networks (CNNs) and Transformers using an interactive cross-encoder fusion module that enhances semantic feature learning and improves classification accuracy in complex scenarios. We develop an adaptive collaboration optimization module that applies human visual attention mechanisms to enhance the feature representation of building types and boundaries simultaneously. To address the scarcity of datasets in building type classification, we create two new datasets, i.e., the urban building type (UBT) dataset and the town building type (TBT) dataset, for model evaluation. Extensive experiments on these datasets demonstrate that CTCFNet outperforms popular CNNs, Transformers, and dual-encoder methods in identifying building types across various regions, achieving the highest mean intersection over union of 78.20% and 77.11%, F1 scores of 86.83% and 88.22%, and overall accuracy of 95.07% and 95.73% on the UBT and TBT datasets, respectively. We conclude that CTCFNet effectively addresses the challenges of high interclass similarity and intraclass inconsistency in complex scenes, yielding results with well-delineated building boundaries and accurate building types.
引用
收藏
页码:976 / 994
页数:19
相关论文
共 52 条
  • [1] [Anonymous], 2024, J. Sel. Topics Appl.Earth Observ. Remote Sens., V17, P8164
  • [2] [Anonymous], 2021, J. Sel. Topics Appl. Earth Observ. Remote Sens., V14, P22
  • [3] Transnational Corporations and Global Governance
    Bartley, Tim
    [J]. ANNUAL REVIEW OF SOCIOLOGY, VOL 44, 2018, 44 : 145 - 165
  • [4] TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers
    Chen, Jieneng
    Mei, Jieru
    Li, Xianhang
    Lu, Yongyi
    Yu, Qihang
    Wei, Qingyue
    Luo, Xiangde
    Xie, Yutong
    Adeli, Ehsan
    Wang, Yan
    Lungren, Matthew P.
    Zhang, Shaoting
    Xing, Lei
    Lu, Le
    Yuille, Alan
    Zhou, Yuyin
    [J]. MEDICAL IMAGE ANALYSIS, 2024, 97
  • [5] Building Extraction from Remote Sensing Images with Sparse Token Transformers
    Chen, Keyan
    Zou, Zhengxia
    Shi, Zhenwei
    [J]. REMOTE SENSING, 2021, 13 (21)
  • [6] Rapid building detection using machine learning
    Cohen, Joseph Paul
    Ding, Wei
    Kuhlman, Caitlin
    Chen, Aijun
    Di, Liping
    [J]. APPLIED INTELLIGENCE, 2016, 45 (02) : 443 - 457
  • [7] A Dual Spatial-Graph Refinement Network for Building Extraction From Aerial Images
    Deng, Ruizhe
    Guo, Zhiling
    Chen, Qi
    Sun, Xian
    Chen, Qihao
    Wang, Hongping
    Liu, Xiuguo
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [8] ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data
    Diakogiannis, Foivos, I
    Waldner, Francois
    Caccetta, Peter
    Wu, Chen
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 162 (162) : 94 - 114
  • [9] Extracting building patterns with multilevel graph partition and building grouping
    Du, Shihong
    Luo, Liqun
    Cao, Kai
    Shu, Mi
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2016, 122 : 81 - 96
  • [10] Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach
    Du, Shihong
    Zhang, Fangli
    Zhang, Xiuyuan
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2015, 105 : 107 - 119