Efficient Depth Fusion Transformer for Aerial Image Semantic Segmentation

被引:22
作者
Yan, Li [1 ,2 ]
Huang, Jianming [1 ]
Xie, Hong [1 ]
Wei, Pengcheng [1 ]
Gao, Zhao [2 ]
机构
[1] Wuhan Univ, Sch Geodesy & Geomat, Wuhan 430079, Peoples R China
[2] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
关键词
semantic segmentation; self-attention; depth fusion; transformer; RESOLUTION; RGB;
D O I
10.3390/rs14051294
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Taking depth into consideration has been proven to improve the performance of semantic segmentation through providing additional geometry information. Most existing works adopt a two-stream network, extracting features from color images and depth images separately using two branches of the same structure, which suffer from high memory and computation costs. We find that depth features acquired by simple downsampling can also play a complementary part in the semantic segmentation task, sometimes even better than the two-stream scheme with the same two branches. In this paper, a novel and efficient depth fusion transformer network for aerial image segmentation is proposed. The presented network utilizes patch merging to downsample depth input and a depth-aware self-attention (DSA) module is designed to mitigate the gap caused by difference between two branches and two modalities. Concretely, the DSA fuses depth features and color features by computing depth similarity and impact on self-attention map calculated by color feature. Extensive experiments on the ISPRS 2D semantic segmentation dataset validate the efficiency and effectiveness of our method. With nearly half the parameters of traditional two-stream scheme, our method acquires 83.82% mIoU on Vaihingen dataset outperforming other state-of-the-art methods and 87.43% mIoU on Potsdam dataset comparable to the state-of-the-art.
引用
收藏
页数:18
相关论文
共 43 条
[21]   Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].
Liu, Ze ;
Lin, Yutong ;
Cao, Yue ;
Hu, Han ;
Wei, Yixuan ;
Zhang, Zheng ;
Lin, Stephen ;
Guo, Baining .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002
[22]  
Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965
[23]   High-Resolution Aerial Images Semantic Segmentation Using Deep Fully Convolutional Network With Channel Attention Mechanism [J].
Luo, Haifeng ;
Chen, Chongcheng ;
Fang, Lina ;
Zhu, Xi ;
Lu, Lijing .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (09) :3492-3507
[24]   High-Resolution Aerial Image Labeling With Convolutional Neural Networks [J].
Maggiori, Emmanuel ;
Tarabalka, Yuliya ;
Charpiat, Guillaume ;
Alliez, Pierre .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (12) :7092-7103
[25]   Land cover mapping at very high resolution with rotation equivariant CNNs: Towards small yet accurate models [J].
Marcos, Diego ;
Volpi, Michele ;
Kellenberger, Benjamin ;
Tuia, Devis .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 145 :96-107
[26]   Classification with an edge: Improving semantic with boundary detection [J].
Marmanis, D. ;
Schindler, K. ;
Wegner, J. D. ;
Galliani, S. ;
Datcu, M. ;
Stilla, U. .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 135 :158-172
[27]   A Relation-Augmented Fully Convolutional Network for Semantic Segmentation in Aerial Scenes [J].
Mou, Lichao ;
Hua, Yuansheng ;
Zhu, Xiao Xiang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12408-12417
[28]   Hybrid Multiple Attention Network for Semantic Segmentation in Aerial Images [J].
Niu, Ruigang ;
Sun, Xian ;
Tian, Yu ;
Diao, Wenhui ;
Chen, Kaiqiang ;
Fu, Kun .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[29]   U-Net: Convolutional Networks for Biomedical Image Segmentation [J].
Ronneberger, Olaf ;
Fischer, Philipp ;
Brox, Thomas .
MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 :234-241
[30]   Problems of encoder-decoder frameworks for high-resolution remote sensing image segmentation: Structural stereotype and insufficient learning [J].
Sun, Yi ;
Tian, Yan ;
Xu, Yiping .
NEUROCOMPUTING, 2019, 330 :297-304