Semantic segmentation using cross-stage feature reweighting and efficient self-attention

被引：1

作者：

Ma, Yingdong ^{[1
]}

Lan, Xiaobin ^{[1
]}

机构：

[1] Inner Mongolia Univ, Coll Comp Sci, 235 West Daxue Rd, Hohhot, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2024年 / 145卷

关键词：

Semantic segmentation; Convolutional neural networks; Transformer; Feature fusion and reweighting; NETWORK;

D O I：

10.1016/j.imavis.2024.104996

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, vision transformers have demonstrated strong performance in various computer vision tasks. The success of ViTs can be attribute to the ability of capturing long-range dependencies. However, transformer-based approaches often yield segmentation maps with incomplete object structures because of restricted cross-stage information propagation and lack of low-level details. To address these problems, we introduce a CNNtransformer semantic segmentation architecture which adopts a CNN backbone for multi-level feature extraction and a transformer encoder that focuses on global perception learning. Transformer embeddings of all stages are integrated to compute feature weights for dynamic cross-stage feature reweighting. As a result, high-level semantic context and low-level spatial details can be embedded into each stage to preserve multi-level information. An efficient attention-based feature fusion mechanism is developed to combine reweighted transformer embeddings with CNN features to generate segmentation maps with more complete object structure. Different from regular self-attention that has quadratic computational complexity, our efficient self-attention method achieves similar performance with linear complexity. Experimental results on ADE20K and Cityscapes datasets show that the proposed segmentation approach demonstrates superior performance against most state-of-the-art networks.

引用

页数：11

共 50 条

[41] MASANet: Multi-Angle Self-Attention Network for Semantic Segmentation of Remote Sensing Images
Zeng, Fuping
Yang, Bin
Zhao, Mengci
Xing, Ying
Ma, Yiran
TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2022, 29 (05): : 1567 - 1575
[42] Semantic Segmentation Algorithm Based Multi-headed Self-attention for Tea Picking Points
Song Y.
Yang S.
Zheng Z.
Ning J.
Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (09): : 297 - 305
[43] ShadowGAN-Former: Reweighting self-attention based on mask for shadow removal
Hu, Jianyi
Wen, Shuhuan
Li, Jiaqi
Karimi, Hamid Reza
NEURAL NETWORKS, 2025, 185
[44] Weakly supervised semantic segmentation for point cloud based on view-based adversarial training and self-attention fusion
Miao, Yongwei
Ren, Guoxiang
Wang, Jinrong
Liu, Fuchang
COMPUTERS & GRAPHICS-UK, 2023, 116 : 46 - 54
[45] Efficient Attention-Convolution Feature Extractor in Semantic Segmentation for Autonomous Driving Systems
Mousavi, Seyed-Hamid
Seyednezhad, Mahdi
Yow, Kin-Choong
IEEE ACCESS, 2023, 11 : 142146 - 142161
[46] A self-attention based global feature enhancing network for semantic segmentation of large-scale urban street-level point clouds
Chen, Qi
Zhang, Zhenxin
Chen, Siyun
Wen, Siyuan
Ma, Hao
Xu, Zhihua
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 113
[47] Self-attention CNN for retinal layer segmentation in OCT
Cao, Guogang
Wu, Yan
Peng, Zeyu
Zhou, Zhilin
Dai, Cuixia
BIOMEDICAL OPTICS EXPRESS, 2024, 15 (03) : 1605 - 1617
[48] Efficient Attention Pyramid Network for Semantic Segmentation
Yang, Qirui
Ku, Tao
Hu, Kunyuan
IEEE ACCESS, 2021, 9 : 18867 - 18875
[49] Self-Attention blocks in UNet and FCN for accurate semantic segmentation of difficult object classes in autonomous driving
Mousavi, Seyed-Hamid
Yow, Kin-Choong
2023 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE, 2023,
[50] Efficient pyramid context encoding and feature embedding for semantic segmentation
Liu, Mengyu
Yin, Hujun
IMAGE AND VISION COMPUTING, 2021, 111

← 1 2 3 4 5 →