MUSTER: A Multi-Scale Transformer-Based Decoder for Semantic Segmentation

被引:0
|
作者
Xu, Jing [1 ]
Shi, Wentao [1 ]
Gao, Pan [1 ]
Li, Qizhu [2 ]
Wang, Zhengwei [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China
[2] TikTok Pte Ltd, Singapore 048583, Singapore
[3] ByteDance, Shanghai 201103, Peoples R China
关键词
Transformers; Decoding; Semantic segmentation; Head; Convolutional neural networks; Semantics; Computer architecture; transformer; decoder; lightweight; feature fusion; IMAGE SEGMENTATION;
D O I
10.1109/TETCI.2024.3449911
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent works on semantic segmentation, there has been a significant focus on designing and integrating transformer-based encoders. However, less attention has been given to transformer-based decoders. We emphasize that the decoder stage is equally vital as the encoder in achieving superior segmentation performance. It disentangles and refines high-level cues, enabling precise object boundary delineation at the pixel level. In this paper, we introduce a novel transformer-based decoder called MUSTER, which seamlessly integrates with hierarchical encoders and consistently delivers high-quality segmentation results, regardless of the encoder architecture. Furthermore, we present a variant of MUSTER that reduces FLOPS while maintaining performance. MUSTER incorporates carefully designed multi-head skip attention (MSKA) units and introduces innovative upsampling operations. The MSKA units enable the fusion of multi-scale features from the encoder and decoder, facilitating comprehensive information integration. The upsampling operation leverages encoder features to enhance object localization and surpasses traditional upsampling methods, improving mIoU (mean Intersection over Union) by 0.4% to 3.2%. On the challenging ADE20K dataset, our best model achieves a single-scale mIoU of 50.23 and a multi-scale mIoU of 51.88, which is on-par with the current state-of-the-art model. Remarkably, we achieve this while significantly reducing the number of FLOPs by 61.3%.
引用
收藏
页码:202 / 212
页数:11
相关论文
共 50 条
  • [41] Multi-scale Global Reasoning Unit for Semantic Segmentation
    Domae, Yukihiro
    Aizawa, Hiroaki
    Kato, Kunihito
    FRONTIERS OF COMPUTER VISION, IW-FCV 2021, 2021, 1405 : 46 - 56
  • [42] Multi-scale full spike pattern for semantic segmentation
    Su, Qiaoyi
    He, Weihua
    Wei, Xiaobao
    Xu, Bo
    Li, Guoqi
    NEURAL NETWORKS, 2024, 176
  • [43] Multi-scale Spatial Location Preference for Semantic Segmentation
    Han, Qiuyuan
    Zheng, Jin
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 593 - 604
  • [44] Multi-scale Autoencoders in Autoencoder for Semantic Image Segmentation
    Yusiong, John Paul T.
    Naval, Prospero C., Jr.
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT I, 2019, 11431 : 587 - 599
  • [45] Transformer-based automated segmentation of recycling materials for semantic understanding in construction
    Wang, Xin
    Han, Wei
    Mo, Sicheng
    Cai, Ting
    Gong, Yijing
    Li, Yin
    Zhu, Zhenhua
    AUTOMATION IN CONSTRUCTION, 2023, 154
  • [46] A Road Crack Segmentation Method Based on Transformer and Multi-Scale Feature Fusion
    Xu, Yang
    Xia, Yonghua
    Zhao, Quai
    Yang, Kaihua
    Li, Qiang
    ELECTRONICS, 2024, 13 (12)
  • [47] Research on Multi-Scale CNN and Transformer-Based Multi-Level Multi-Classification Method for Images
    Gou, Quandeng
    Ren, Yuheng
    IEEE ACCESS, 2024, 12 : 103049 - 103059
  • [48] A novel transformer-based semantic segmentation framework for structural condition assessment
    Wang, Ruhua
    Shao, Yanda
    Li, Qilin
    Li, Ling
    Li, Jun
    Hao, Hong
    STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2024, 23 (02): : 1170 - 1183
  • [49] Diverter transformer-based multi-encoder-multi-decoder network model for medical retinal blood vessel image segmentation
    Wu, Chengwei
    Guo, Min
    Ma, Miao
    Wang, Kaiguang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 93
  • [50] MFFLNet: lightweight semantic segmentation network based on multi-scale feature fusion
    Wei Depeng
    Wang Huabin
    Multimedia Tools and Applications, 2024, 83 : 30073 - 30093