Adaptive Local Cross-Channel Vector Pooling Attention Module for Semantic Segmentation of Remote Sensing Imagery

被引：8

作者：

Wang, Xiaofeng ^{[1
]}

Kang, Menglei ^{[1
]}

Chen, Yan ^{[2
]}

Jiang, Wenxiang ^{[2
]}

Wang, Mengyuan ^{[2
]}

Weise, Thomas ^{[2
]}

Tan, Ming ^{[2
]}

Xu, Lixiang ^{[1
]}

Li, Xinlu ^{[1
]}

Zou, Le ^{[1
]}

Zhang, Chen ^{[1
]}

机构：

[1] Hefei Univ, Sch Artificial Intelligence & Big Data, Dept Big Data & Informat Engn, Hefei 230601, Peoples R China

[2] Hefei Univ, Inst Appl Optimizat, Sch Artificial Intelligence & Big Data, Hefei 230601, Peoples R China

来源：

REMOTE SENSING | 2023年 / 15卷 / 08期

基金：

中国国家自然科学基金;

关键词：

adaptive local cross-channel interaction; vector average pooling; attention mechanism; remote sensing imagery; semantic segmentation; deep learning; NETWORK; CLASSIFICATION; FUSION;

D O I：

10.3390/rs15081980

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

Adding an attention module to the deep convolution semantic segmentation network has significantly enhanced the network performance. However, the existing channel attention module focusing on the channel dimension neglects the spatial relationship, causing location noise to transmit to the decoder. In addition, the spatial attention module exemplified by self-attention has a high training cost and challenges in execution efficiency, making it unsuitable to handle large-scale remote sensing data. We propose an efficient vector pooling attention (VPA) module for building the channel and spatial location relationship. The module can locate spatial information better by performing a unique vector average pooling in the vertical and horizontal dimensions of the feature maps. Furthermore, it can also learn the weights directly by using the adaptive local cross-channel interaction. Multiple weight learning ablation studies and comparison experiments with the classical attention modules were conducted by connecting the VPA module to a modified DeepLabV3 network using ResNet50 as the encoder. The results show that the mIoU of our network with the addition of an adaptive local cross-channel interaction VPA module increases by 3% compared to the standard network on the MO-CSSSD. The VPA-based semantic segmentation network can significantly improve precision efficiency compared with other conventional attention networks. Furthermore, the results on the WHU Building dataset present an improvement in IoU and F1-score by 1.69% and 0.97%, respectively. Our network raises the mIoU by 1.24% on the ISPRS Vaihingen dataset. The VPA module can also significantly improve the network's performance on small target segmentation.

引用

页数：20

共 48 条

[1] Research Contribution and Comprehensive Review towards the Semantic Segmentation of Aerial Images Using Deep Learning Techniques [J].