Scene-level buildings damage recognition based on Cross Conv-Transformer

被引:3
作者
Shi, Lingfei [1 ]
Zhang, Feng [1 ,2 ,5 ]
Xia, Junshi [3 ]
Xie, Jibo [4 ]
机构
[1] Zhejiang Univ, Sch Earth Sci, Hangzhou, Peoples R China
[2] Zhejiang Prov Key Lab Geog Informat Sci, Hangzhou, Peoples R China
[3] RIKEN Ctr Adv Intelligence Project, Geoinformat Unit, Tokyo, Japan
[4] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing, Peoples R China
[5] Zhejiang Univ, Sch Earth Sci, 38 Zheda Rd, Hangzhou 310027, Peoples R China
关键词
Scene recognition; damaged buildings; aerial images; transformer;
D O I
10.1080/17538947.2023.2261770
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Different to pixel-based and object-based image recognition, a larger perspective based on the scene can improve the efficiency of assessing large-scale building damage. However, the complexity of disaster scenes and the scarcity of datasets are major challenges in identifying building damage. To address these challenges, the Cross Conv-Transformer model is proposed to classify and evaluate the degree of damage to buildings using aerial images taken after earthquake. We employ Conv-Embedding and Conv-Projection to extract features from the images. The integration of convolution and Transformer reduces the computational burden of the model while enhancing its feature extraction capabilities. Furthermore, the two branch Conv-Transformer architecture with global and local attention is designed, allowing each branch to focus on global and local features respectively. The cross-attention fusion module merges feature information from the two branches to enrich classification features. At last, we utilize aerial images captured during the Beichuan and Yushu earthquakes as both the training and test sets to assess the model. The proposed Cross Conv-Transformer model improved classification accuracy by 4.7% and 2.1% compared to the ViT and EfficientNet. The results show that the Cross Conv-Transformer model could significantly reduces misclassification between severely and moderately damaged categories.
引用
收藏
页码:3987 / 4007
页数:21
相关论文
共 48 条
[1]   Multi-Source Data Fusion Based on Ensemble Learning for Rapid Building Damage Mapping during the 2018 Sulawesi Earthquake and Tsunami in Palu, Indonesia [J].
Adriano, Bruno ;
Xia, Junshi ;
Baier, Gerald ;
Yokoya, Naoto ;
Koshimura, Shunichi .
REMOTE SENSING, 2019, 11 (07)
[2]  
Akhmadiya Asset, 2020, Change Detection Based Building Damage Assessment Method Using Radar Imageries with GLCM Textural Parameters, DOI [10.20944/preprints202001.0225.v1, DOI 10.20944/PREPRINTS202001.0225.V1]
[3]   Vision Transformers for Remote Sensing Image Classification [J].
Bazi, Yakoub ;
Bashmal, Laila ;
Rahhal, Mohamad M. Al ;
Dayil, Reham Al ;
Ajlan, Naif Al .
REMOTE SENSING, 2021, 13 (03) :1-20
[4]   Optimal segmentation of high spatial resolution images for the classification of buildings using random forests [J].
Bialas, James ;
Oommen, Thomas ;
Havens, Timothy C. .
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2019, 82
[5]  
Chen C., 2019, 7 INT C LEARN REPR I, P1
[6]   CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].
Chen, Chun-Fu ;
Fan, Quanfu ;
Panda, Rameswar .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356
[7]   Urban Damage Level Mapping Based on Co-Polarization Coherence Pattern Using Multitemporal Polarimetric SAR Data [J].
Chen, Si-Wei ;
Wang, Xue-Song ;
Xiao, Shun-Ping .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2018, 11 (08) :2657-2667
[8]  
Chen Y., 2021, arXiv, DOI [10.48550/arXiv.2101.1198, DOI 10.48550/ARXIV.2101.1198]
[9]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[10]   Detection of seismic facade damages with multi-temporal oblique aerial imagery [J].
Duarte, Diogo ;
Nex, Francesco ;
Kerle, Norman ;
Vosselman, George .
GISCIENCE & REMOTE SENSING, 2020, 57 (05) :670-686