Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images

被引:172
作者
Wang, Libo [1 ]
Li, Rui [1 ]
Wang, Dongzhi [2 ]
Duan, Chenxi [3 ]
Wang, Teng [2 ]
Meng, Xiaoliang [1 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Peoples R China
[2] Surveying & Mapping Inst, Lands & Resource Dept Guangdong Prov, Guangzhou 510500, Peoples R China
[3] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, Wuhan 430079, Peoples R China
基金
中国国家自然科学基金;
关键词
urban scene segmentation; remote sensing; transformer; attention mechanism; LAND-COVER; DEEP; CLASSIFICATION;
D O I
10.3390/rs13163065
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Semantic segmentation from very fine resolution (VFR) urban scene images plays a significant role in several application scenarios including autonomous driving, land cover classification, urban planning, etc. However, the tremendous details contained in the VFR image, especially the considerable variations in scale and appearance of objects, severely limit the potential of the existing deep learning approaches. Addressing such issues represents a promising research field in the remote sensing community, which paves the way for scene-level landscape pattern analysis and decision making. In this paper, we propose a Bilateral Awareness Network which contains a dependency path and a texture path to fully capture the long-range relationships and fine-grained details in VFR images. Specifically, the dependency path is conducted based on the ResT, a novel Transformer backbone with memory-efficient multi-head self-attention, while the texture path is built on the stacked convolution operation. In addition, using the linear attention mechanism, a feature aggregation module is designed to effectively fuse the dependency features and texture features. Extensive experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effectiveness of our BANet. Specifically, a 64.6% mIoU is achieved on the UAVid dataset.
引用
收藏
页数:20
相关论文
共 61 条
[31]   Deep learning in remote sensing applications: A meta-analysis and review [J].
Ma, Lei ;
Liu, Yu ;
Zhang, Xueliang ;
Ye, Yuanxin ;
Yin, Gaofei ;
Johnson, Brian Alan .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 152 :166-177
[32]   High-Resolution Aerial Image Labeling With Convolutional Neural Networks [J].
Maggiori, Emmanuel ;
Tarabalka, Yuliya ;
Charpiat, Guillaume ;
Alliez, Pierre .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (12) :7092-7103
[33]   Land cover mapping at very high resolution with rotation equivariant CNNs: Towards small yet accurate models [J].
Marcos, Diego ;
Volpi, Michele ;
Kellenberger, Benjamin ;
Tuia, Devis .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 145 :96-107
[34]   Classification with an edge: Improving semantic with boundary detection [J].
Marmanis, D. ;
Schindler, K. ;
Wegner, J. D. ;
Galliani, S. ;
Datcu, M. ;
Stilla, U. .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 135 :158-172
[35]   Segment-Based Land Cover Mapping of a Suburban Area-Comparison of High-Resolution Remotely Sensed Datasets Using Classification Trees and Test Field Points [J].
Matikainen, Leena ;
Karila, Kirsi .
REMOTE SENSING, 2011, 3 (08) :1777-1804
[36]   Semantic Segmentation Deep Learning for Extracting Surface Mine Extents from Historic Topographic Maps [J].
Maxwell, Aaron E. ;
Bester, Michelle S. ;
Guillen, Luis A. ;
Ramezan, Christopher A. ;
Carpinello, Dennis J. ;
Fan, Yiting ;
Hartley, Faith M. ;
Maynard, Shannon M. ;
Pyron, Jaimee L. .
REMOTE SENSING, 2020, 12 (24) :1-25
[37]  
Nair V., 2010, P 27 INT C MACH LEAR, P807
[38]   Efficient semantic segmentation with pyramidal fusion [J].
Orsic, Marin ;
Segvic, Sinisa .
PATTERN RECOGNITION, 2021, 110
[39]   Random forest classifier for remote sensing classification [J].
Pal, M .
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2005, 26 (01) :217-222
[40]  
Poudel R. P. K., 2019, 30 BRIT MACH VIS C B, P289