SSformer: A Lightweight Transformer for Semantic Segmentation

被引:23
|
作者
Shi, Wentao [1 ]
Xu, Jing [1 ]
Gao, Pan [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
来源
2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP) | 2022年
关键词
Image Segmentation; Transformer; Multilayer perceptron; Lightweight model;
D O I
10.1109/MMSP55362.2022.9949177
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
It is well believed that Transformer performs better in semantic segmentation compared to convolutional neural networks. Nevertheless, the original Vision Transformer [2] may lack of inductive biases of local neighborhoods and possess a high time complexity. Recently, Swin Transformer [3] sets a new record in various vision tasks by using hierarchical architecture and shifted windows while being more efficient. However, as Swin Transformer is specifically designed for image classification, it may achieve suboptimal performance on dense prediction-based segmentation task. Further, simply combing Swin Transformer with existing methods would lead to the boost of model size and parameters for the final segmentation model. In this paper, we rethink the Swin Transformer for semantic segmentation, and design a lightweight yet effective transformer model, called SSformer. In this model, considering the inherent hierarchical design of Swin Transformer, we propose a decoder to aggregate information from different layers, thus obtaining both local and global attentions. Experimental results show the proposed SSformer yields comparable mIoU performance with state-of-the-art models, while maintaining a smaller model size and lower compute. Source code and pretrained models are available at: https://github.com/shiwt03/SSformer
引用
收藏
页数:5
相关论文
共 50 条
  • [31] An Enhanced Downsampling Transformer Network for Point Cloud Semantic Segmentation
    Wang, Yang
    Wei, Zixuan
    Wan, Zhibo
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 262 - 269
  • [32] Transformer fusion for indoor RGB-D semantic segmentation
    Wu, Zongwei
    Zhou, Zhuyun
    Allibert, Guillaume
    Stolz, Christophe
    Demonceaux, Cedric
    Ma, Chao
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [33] Learning graph structures with transformer for weakly supervised semantic segmentation
    Wanchun Sun
    Xin Feng
    Hui Ma
    Jingyao Liu
    Complex & Intelligent Systems, 2023, 9 : 7511 - 7521
  • [34] Learning graph structures with transformer for weakly supervised semantic segmentation
    Sun, Wanchun
    Feng, Xin
    Ma, Hui
    Liu, Jingyao
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (06) : 7511 - 7521
  • [35] Transformer framework for depth-assisted UDA semantic segmentation
    Song, Yunna
    Shi, Jinlong
    Zou, Danping
    Liu, Caisheng
    Bai, Suqin
    Shu, Xin
    Qian, Qian
    Xu, Dan
    Yuan, Yu
    Sun, Yunhan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [36] TBFormer: three-branch efficient transformer for semantic segmentation
    Wei, Can
    Wei, Yan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3661 - 3672
  • [37] Remote Sensing Image Semantic Segmentation Based on Cascaded Transformer
    Wang F.
    Ji J.
    Wang Y.
    IEEE. Trans. Artif. Intell., 2024, 8 (4136-4148): : 4136 - 4148
  • [38] CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation
    Chen, Xin
    Li, Dongfen
    Liu, Mingzhe
    Jia, Jiaru
    REMOTE SENSING, 2023, 15 (18)
  • [39] Global and edge enhanced transformer for semantic segmentation of remote sensing
    Wang, Hengyou
    Li, Xiao
    Huo, Lianzhi
    Hu, Changmiao
    APPLIED INTELLIGENCE, 2024, 54 (07) : 5658 - 5673
  • [40] Efficient Depth Fusion Transformer for Aerial Image Semantic Segmentation
    Yan, Li
    Huang, Jianming
    Xie, Hong
    Wei, Pengcheng
    Gao, Zhao
    REMOTE SENSING, 2022, 14 (05)