Adaptive Local Cross-Channel Vector Pooling Attention Module for Semantic Segmentation of Remote Sensing Imagery

被引:8
作者
Wang, Xiaofeng [1 ]
Kang, Menglei [1 ]
Chen, Yan [2 ]
Jiang, Wenxiang [2 ]
Wang, Mengyuan [2 ]
Weise, Thomas [2 ]
Tan, Ming [2 ]
Xu, Lixiang [1 ]
Li, Xinlu [1 ]
Zou, Le [1 ]
Zhang, Chen [1 ]
机构
[1] Hefei Univ, Sch Artificial Intelligence & Big Data, Dept Big Data & Informat Engn, Hefei 230601, Peoples R China
[2] Hefei Univ, Inst Appl Optimizat, Sch Artificial Intelligence & Big Data, Hefei 230601, Peoples R China
基金
中国国家自然科学基金;
关键词
adaptive local cross-channel interaction; vector average pooling; attention mechanism; remote sensing imagery; semantic segmentation; deep learning; NETWORK; CLASSIFICATION; FUSION;
D O I
10.3390/rs15081980
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Adding an attention module to the deep convolution semantic segmentation network has significantly enhanced the network performance. However, the existing channel attention module focusing on the channel dimension neglects the spatial relationship, causing location noise to transmit to the decoder. In addition, the spatial attention module exemplified by self-attention has a high training cost and challenges in execution efficiency, making it unsuitable to handle large-scale remote sensing data. We propose an efficient vector pooling attention (VPA) module for building the channel and spatial location relationship. The module can locate spatial information better by performing a unique vector average pooling in the vertical and horizontal dimensions of the feature maps. Furthermore, it can also learn the weights directly by using the adaptive local cross-channel interaction. Multiple weight learning ablation studies and comparison experiments with the classical attention modules were conducted by connecting the VPA module to a modified DeepLabV3 network using ResNet50 as the encoder. The results show that the mIoU of our network with the addition of an adaptive local cross-channel interaction VPA module increases by 3% compared to the standard network on the MO-CSSSD. The VPA-based semantic segmentation network can significantly improve precision efficiency compared with other conventional attention networks. Furthermore, the results on the WHU Building dataset present an improvement in IoU and F1-score by 1.69% and 0.97%, respectively. Our network raises the mIoU by 1.24% on the ISPRS Vaihingen dataset. The VPA module can also significantly improve the network's performance on small target segmentation.
引用
收藏
页数:20
相关论文
共 48 条
  • [1] Research Contribution and Comprehensive Review towards the Semantic Segmentation of Aerial Images Using Deep Learning Techniques
    Anilkumar, P.
    Venugopal, P.
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [2] MSANet: Multi-scale attention networks for image classification
    Cao, Ping
    Xie, Fangxin
    Zhang, Shichao
    Zhang, Zuping
    Zhang, Jianfeng
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34325 - 34344
  • [3] Chen LC, 2016, Arxiv, DOI [arXiv:1412.7062, 10.48550/arXiv.1412.7062, DOI 10.48550/ARXIV.1412.7062]
  • [4] Chen LC, 2017, Arxiv, DOI arXiv:1706.05587
  • [5] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [6] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [7] Chen Y., 2022, PATTERN RECOGN LETT, V4, P48
  • [8] An object detection network based on YOLOv4 and improved spatial attention mechanism
    Chen, Zhixiong
    Tian, Shengwei
    Yu, Long
    Zhang, Liqiang
    Zhang, Xinyu
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (03) : 2359 - 2368
  • [9] Xception: Deep Learning with Depthwise Separable Convolutions
    Chollet, Francois
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
  • [10] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929