Lightweight Real-Time Semantic Segmentation Network With Efficient Transformer and CNN

被引:46
|
作者
Xu, Guoan [1 ,2 ]
Li, Juncheng [3 ,4 ]
Gao, Guangwei [1 ,2 ]
Lu, Huimin [5 ]
Yang, Jian [6 ]
Yue, Dong [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Inst Adv Technol, Nanjing, 210023, Peoples R China
[2] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou 215006, Peoples R China
[3] Shanghai Univ, Sch Commun & Informat Engn, Shanghai, Peoples R China
[4] Nanjing Univ Sci & Technol, Jiangsu Key Lab Image & Video Understanding Social, Nanjing 210049, Peoples R China
[5] Kyushu Inst Technol, Kitakyushu 8048550, Japan
[6] Nanjing Univ Sci & Technol, Sch Comp Sci & Technol, Nanjing 210049, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Convolution; Semantic segmentation; Task analysis; Convolutional neural networks; Computational modeling; Semantics; Real-time semantic segmentation; convolutional neural network; lightweight network; transformer;
D O I
10.1109/TITS.2023.3248089
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
In the past decade, convolutional neural networks (CNNs) have shown prominence for semantic segmentation. Although CNN models have very impressive performance, the ability to capture global representation is still insufficient, which results in suboptimal results. Recently, Transformer achieved huge success in NLP tasks, demonstrating its advantages in modeling long-range dependency. Recently, Transformer has also attracted tremendous attention from computer vision researchers who reformulate the image processing tasks as a sequence-to-sequence prediction but resulted in deteriorating local feature details. In this work, we propose a lightweight real-time semantic segmentation network called LETNet. LETNet combines a U-shaped CNN with Transformer effectively in a capsule embedding style to compensate for respective deficiencies. Meanwhile, the elaborately designed Lightweight Dilated Bottleneck (LDB) module and Feature Enhancement (FE) module cultivate a positive impact on training from scratch simultaneously. Extensive experiments performed on challenging datasets demonstrate that LETNet achieves superior performances in accuracy and efficiency balance. Specifically, It only contains 0.95M parameters and 13.6G FLOPs but yields 72.8% mIoU at 120 FPS on the Cityscapes test set and 70.5% mIoU at 250 FPS on the CamVid test dataset using a single RTX 3090 GPU. Source code will be available at https://github.com/IVIPLab/LETNet.
引用
收藏
页码:15897 / 15906
页数:10
相关论文
共 50 条
  • [1] Lightweight and efficient feature fusion real-time semantic segmentation network
    Zhong, Jie
    Chen, Aiguo
    Jiang, Yizhang
    Sun, Chengcheng
    Peng, Yuheng
    IMAGE AND VISION COMPUTING, 2025, 154
  • [2] Lightweight and efficient asymmetric network design for real-time semantic segmentation
    Xiu-Ling Zhang
    Bing-Ce Du
    Zhao-Ci Luo
    Kai Ma
    Applied Intelligence, 2022, 52 : 564 - 579
  • [3] Lightweight and efficient asymmetric network design for real-time semantic segmentation
    Zhang, Xiu-Ling
    Du, Bing-Ce
    Luo, Zhao-Ci
    Ma, Kai
    APPLIED INTELLIGENCE, 2022, 52 (01) : 564 - 579
  • [4] Lightweight Bilateral Network for Real-Time Semantic Segmentation
    Wang, Pengtao
    Li, Lihong
    Pan, Feiyang
    Wang, Lin
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2023, 27 (04) : 673 - 682
  • [5] LACTNet: A Lightweight Real-Time Semantic Segmentation Network Based on an Aggregated Convolutional Neural Network and Transformer
    Zhang, Xiangyue
    Li, Hexiao
    Ru, Jingyu
    Ji, Peng
    Wu, Chengdong
    ELECTRONICS, 2024, 13 (12)
  • [6] ELUNet: an efficient and lightweight U-shape network for real-time semantic segmentation
    Ai, Yufeng
    Guo, Jichang
    Wang, Yudong
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (02)
  • [7] RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
    Wang, Jian
    Gou, Chenhui
    Wu, Qiman
    Feng, Haocheng
    Han, Junyu
    Ding, Errui
    Wang, Jingdong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] A lightweight network with attention decoder for real-time semantic segmentation
    Kang Wang
    Jinfu Yang
    Shuai Yuan
    Mingai Li
    The Visual Computer, 2022, 38 : 2329 - 2339
  • [9] Lightweight Asymmetric Dilation Network for Real-Time Semantic Segmentation
    Hu, Xuegang
    Gong, Yu
    IEEE ACCESS, 2021, 9 : 55630 - 55643
  • [10] A lightweight network with attention decoder for real-time semantic segmentation
    Wang, Kang
    Yang, Jinfu
    Yuan, Shuai
    Li, Mingai
    VISUAL COMPUTER, 2022, 38 (07): : 2329 - 2339