Semantic segmentation using cross-stage feature reweighting and efficient self-attention

被引:1
作者
Ma, Yingdong [1 ]
Lan, Xiaobin [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, 235 West Daxue Rd, Hohhot, Peoples R China
关键词
Semantic segmentation; Convolutional neural networks; Transformer; Feature fusion and reweighting; NETWORK;
D O I
10.1016/j.imavis.2024.104996
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, vision transformers have demonstrated strong performance in various computer vision tasks. The success of ViTs can be attribute to the ability of capturing long-range dependencies. However, transformer-based approaches often yield segmentation maps with incomplete object structures because of restricted cross-stage information propagation and lack of low-level details. To address these problems, we introduce a CNNtransformer semantic segmentation architecture which adopts a CNN backbone for multi-level feature extraction and a transformer encoder that focuses on global perception learning. Transformer embeddings of all stages are integrated to compute feature weights for dynamic cross-stage feature reweighting. As a result, high-level semantic context and low-level spatial details can be embedded into each stage to preserve multi-level information. An efficient attention-based feature fusion mechanism is developed to combine reweighted transformer embeddings with CNN features to generate segmentation maps with more complete object structure. Different from regular self-attention that has quadratic computational complexity, our efficient self-attention method achieves similar performance with linear complexity. Experimental results on ADE20K and Cityscapes datasets show that the proposed segmentation approach demonstrates superior performance against most state-of-the-art networks.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] MASANet: Multi-Angle Self-Attention Network for Semantic Segmentation of Remote Sensing Images
    Zeng, Fuping
    Yang, Bin
    Zhao, Mengci
    Xing, Ying
    Ma, Yiran
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2022, 29 (05): : 1567 - 1575
  • [42] Semantic Segmentation Algorithm Based Multi-headed Self-attention for Tea Picking Points
    Song Y.
    Yang S.
    Zheng Z.
    Ning J.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (09): : 297 - 305
  • [43] ShadowGAN-Former: Reweighting self-attention based on mask for shadow removal
    Hu, Jianyi
    Wen, Shuhuan
    Li, Jiaqi
    Karimi, Hamid Reza
    NEURAL NETWORKS, 2025, 185
  • [44] Weakly supervised semantic segmentation for point cloud based on view-based adversarial training and self-attention fusion
    Miao, Yongwei
    Ren, Guoxiang
    Wang, Jinrong
    Liu, Fuchang
    COMPUTERS & GRAPHICS-UK, 2023, 116 : 46 - 54
  • [45] Efficient Attention-Convolution Feature Extractor in Semantic Segmentation for Autonomous Driving Systems
    Mousavi, Seyed-Hamid
    Seyednezhad, Mahdi
    Yow, Kin-Choong
    IEEE ACCESS, 2023, 11 : 142146 - 142161
  • [46] A self-attention based global feature enhancing network for semantic segmentation of large-scale urban street-level point clouds
    Chen, Qi
    Zhang, Zhenxin
    Chen, Siyun
    Wen, Siyuan
    Ma, Hao
    Xu, Zhihua
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 113
  • [47] Self-attention CNN for retinal layer segmentation in OCT
    Cao, Guogang
    Wu, Yan
    Peng, Zeyu
    Zhou, Zhilin
    Dai, Cuixia
    BIOMEDICAL OPTICS EXPRESS, 2024, 15 (03) : 1605 - 1617
  • [48] Efficient Attention Pyramid Network for Semantic Segmentation
    Yang, Qirui
    Ku, Tao
    Hu, Kunyuan
    IEEE ACCESS, 2021, 9 : 18867 - 18875
  • [49] Self-Attention blocks in UNet and FCN for accurate semantic segmentation of difficult object classes in autonomous driving
    Mousavi, Seyed-Hamid
    Yow, Kin-Choong
    2023 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE, 2023,
  • [50] Efficient pyramid context encoding and feature embedding for semantic segmentation
    Liu, Mengyu
    Yin, Hujun
    IMAGE AND VISION COMPUTING, 2021, 111