Lightweight Real-Time Semantic Segmentation Network With Efficient Transformer and CNN

被引:72
作者
Xu, Guoan [1 ,2 ]
Li, Juncheng [3 ,4 ]
Gao, Guangwei [1 ,2 ]
Lu, Huimin [5 ]
Yang, Jian [6 ]
Yue, Dong [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Inst Adv Technol, Nanjing, 210023, Peoples R China
[2] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou 215006, Peoples R China
[3] Shanghai Univ, Sch Commun & Informat Engn, Shanghai, Peoples R China
[4] Nanjing Univ Sci & Technol, Jiangsu Key Lab Image & Video Understanding Social, Nanjing 210049, Peoples R China
[5] Kyushu Inst Technol, Kitakyushu 8048550, Japan
[6] Nanjing Univ Sci & Technol, Sch Comp Sci & Technol, Nanjing 210049, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Convolution; Semantic segmentation; Task analysis; Convolutional neural networks; Computational modeling; Semantics; Real-time semantic segmentation; convolutional neural network; lightweight network; transformer;
D O I
10.1109/TITS.2023.3248089
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
In the past decade, convolutional neural networks (CNNs) have shown prominence for semantic segmentation. Although CNN models have very impressive performance, the ability to capture global representation is still insufficient, which results in suboptimal results. Recently, Transformer achieved huge success in NLP tasks, demonstrating its advantages in modeling long-range dependency. Recently, Transformer has also attracted tremendous attention from computer vision researchers who reformulate the image processing tasks as a sequence-to-sequence prediction but resulted in deteriorating local feature details. In this work, we propose a lightweight real-time semantic segmentation network called LETNet. LETNet combines a U-shaped CNN with Transformer effectively in a capsule embedding style to compensate for respective deficiencies. Meanwhile, the elaborately designed Lightweight Dilated Bottleneck (LDB) module and Feature Enhancement (FE) module cultivate a positive impact on training from scratch simultaneously. Extensive experiments performed on challenging datasets demonstrate that LETNet achieves superior performances in accuracy and efficiency balance. Specifically, It only contains 0.95M parameters and 13.6G FLOPs but yields 72.8% mIoU at 120 FPS on the Cityscapes test set and 70.5% mIoU at 250 FPS on the CamVid test dataset using a single RTX 3090 GPU. Source code will be available at https://github.com/IVIPLab/LETNet.
引用
收藏
页码:15897 / 15906
页数:10
相关论文
共 54 条
[1]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[2]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[3]  
Chen J, 2021, ARXIV
[4]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[5]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[6]  
Chowdhary K. R., 2020, Fundamentals of Artificial Intelligence, P603
[7]   Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [J].
Dong, Genshun ;
Yan, Yan ;
Shen, Chunhua ;
Wang, Hanzi .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (06) :3258-3274
[8]  
Dosovitskiy A., 2021, P INT C LEARN REPR, DOI DOI 10.48550/ARXIV.2010.11929
[9]   Rethinking BiSeNet For Real-time Semantic Segmentation [J].
Fan, Mingyuan ;
Lai, Shenqi ;
Huang, Junshi ;
Wei, Xiaoming ;
Chai, Zhenhua ;
Luo, Junfeng ;
Wei, Xiaolin .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9711-9720
[10]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149