MUSTER: A Multi-Scale Transformer-Based Decoder for Semantic Segmentation

被引：0

作者：

Xu, Jing ^{[1
]}

Shi, Wentao ^{[1
]}

Gao, Pan ^{[1
]}

Li, Qizhu ^{[2
]}

Wang, Zhengwei ^{[3
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China

[2] TikTok Pte Ltd, Singapore 048583, Singapore

[3] ByteDance, Shanghai 201103, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2025年 / 9卷 / 01期

关键词：

Transformers; Decoding; Semantic segmentation; Head; Convolutional neural networks; Semantics; Computer architecture; transformer; decoder; lightweight; feature fusion; IMAGE SEGMENTATION;

D O I：

10.1109/TETCI.2024.3449911

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent works on semantic segmentation, there has been a significant focus on designing and integrating transformer-based encoders. However, less attention has been given to transformer-based decoders. We emphasize that the decoder stage is equally vital as the encoder in achieving superior segmentation performance. It disentangles and refines high-level cues, enabling precise object boundary delineation at the pixel level. In this paper, we introduce a novel transformer-based decoder called MUSTER, which seamlessly integrates with hierarchical encoders and consistently delivers high-quality segmentation results, regardless of the encoder architecture. Furthermore, we present a variant of MUSTER that reduces FLOPS while maintaining performance. MUSTER incorporates carefully designed multi-head skip attention (MSKA) units and introduces innovative upsampling operations. The MSKA units enable the fusion of multi-scale features from the encoder and decoder, facilitating comprehensive information integration. The upsampling operation leverages encoder features to enhance object localization and surpasses traditional upsampling methods, improving mIoU (mean Intersection over Union) by 0.4% to 3.2%. On the challenging ADE20K dataset, our best model achieves a single-scale mIoU of 50.23 and a multi-scale mIoU of 51.88, which is on-par with the current state-of-the-art model. Remarkably, we achieve this while significantly reducing the number of FLOPs by 61.3%.

引用

页码：202 / 212

页数：11

共 50 条

[21] Attention Guided Encoder-Decoder Network With Multi-Scale Context Aggregation for Land Cover Segmentation
Wang, Shuyang
Mu, Xiaodong
Yang, Dongfang
He, Hao
Zhao, Peng
IEEE ACCESS, 2020, 8 : 215299 - 215309
[22] DMFC-UFormer: Depthwise multi-scale factorized convolution transformer-based UNet for medical image segmentation
Garbaz, Anass
Oukdach, Yassine
Charfi, Said
El Ansari, Mohamed
Koutti, Lahcen
Salihoun, Mouna
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 101
[23] Research on Multi-Scale CNN and Transformer-Based Multi-Level Multi-Classification Method for Images
Gou, Quandeng
Ren, Yuheng
IEEE ACCESS, 2024, 12 : 103049 - 103059
[24] A novel transformer-based semantic segmentation framework for structural condition assessment
Wang, Ruhua
Shao, Yanda
Li, Qilin
Li, Ling
Li, Jun
Hao, Hong
STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2024, 23 (02): : 1170 - 1183
[25] A Road Crack Segmentation Method Based on Transformer and Multi-Scale Feature Fusion
Xu, Yang
Xia, Yonghua
Zhao, Quai
Yang, Kaihua
Li, Qiang
ELECTRONICS, 2024, 13 (12)
[26] Point Cloud Semantic Segmentation Network Based on Multi-Scale Feature Fusion
Du, Jing
Jiang, Zuning
Huang, Shangfeng
Wang, Zongyue
Su, Jinhe
Su, Songjian
Wu, Yundong
Cai, Guorong
SENSORS, 2021, 21 (05) : 1 - 20
[27] Multi-scale Field Distillation for Multi-task Semantic Segmentation
Dong, Aimei
Liu, Sidi
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT II, 2023, 14255 : 508 - 519
[28] Fusion multi-scale Transformer skin lesion segmentation algorithm
Liang L.-M.
Zhou L.-S.
Yin J.
Sheng X.-Q.
Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2024, 54 (04): : 1086 - 1098
[29] Enhanced multi-scale networks for semantic segmentation
Tianping Li
Zhaotong Cui
Yu Han
Guanxing Li
Meng Li
Dongmei Wei
Complex & Intelligent Systems, 2024, 10 : 2557 - 2568
[30] Enhanced multi-scale networks for semantic segmentation
Li, Tianping
Cui, Zhaotong
Han, Yu
Li, Guanxing
Li, Meng
Wei, Dongmei
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (02) : 2557 - 2568

← 1 2 3 4 5 →