LightSeg: Local Spatial Perception Convolution for Real-Time Semantic Segmentation

被引:1
作者
Lei, Xiaochun [1 ,2 ]
Liang, Jiaming [1 ]
Gong, Zhaoting [1 ]
Jiang, Zetao [1 ,2 ]
机构
[1] Guilin Univ Elect Technol, Sch Comp Sci & Informat Secur, Guilin 541004, Peoples R China
[2] Guilin Univ Elect Technol, Guangxi Key Lab Image & G Intelligent Proc, Guilin 541004, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 14期
基金
中国国家自然科学基金;
关键词
semantic segmentation; low-power devices; real-time inference; efficient deep learning; NETWORK;
D O I
10.3390/app13148130
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Semantic segmentation is increasingly being applied on mobile devices due to advancements in mobile chipsets, particularly in low-power consumption scenarios. However, the lightweight design of mobile devices poses limitations on the receptive field, which is crucial for dense prediction problems. Existing approaches have attempted to balance lightweight designs and high accuracy by downsampling features in the backbone. However, this downsampling may result in the loss of local details at each network stage. To address this challenge, this paper presents a novel solution in the form of a compact and efficient convolutional neural network (CNN) for real-time applications: our proposed model, local spatial perception convolution (LSPConv). Furthermore, the effectiveness of our architecture is demonstrated on the Cityscapes dataset. The results show that our model achieves an impressive balance between accuracy and inference speed. Specifically, our LightSeg, which does not rely on ImageNet pretraining, achieves an mIoU of 76.1 at a speed of 61 FPS on the Cityscapes validation set, utilizing an RTX 2080Ti GPU with mixed precision. Additionally, it achieves a speed of 115.7 FPS on the Jetson NX with int8 precision.
引用
收藏
页数:15
相关论文
共 41 条
  • [1] Segmentation and Recognition Using Structure from Motion Point Clouds
    Brostow, Gabriel J.
    Shotton, Jamie
    Fauqueur, Julien
    Cipolla, Roberto
    [J]. COMPUTER VISION - ECCV 2008, PT I, PROCEEDINGS, 2008, 5302 : 44 - +
  • [2] Semantic object classes in video: A high-definition ground truth database
    Brostow, Gabriel J.
    Fauqueur, Julien
    Cipolla, Roberto
    [J]. PATTERN RECOGNITION LETTERS, 2009, 30 (02) : 88 - 97
  • [3] Deep Spatio-Temporal Random Fields for Efficient Video Segmentation
    Chandra, Siddhartha
    Couprie, Camille
    Kokkinos, Iasonas
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8915 - 8924
  • [4] Chen J, 2021, arXiv
  • [5] Chen LC, 2016, Arxiv, DOI [arXiv:1412.7062, 10.1080/17476938708814211]
  • [6] Chen LC, 2017, Arxiv, DOI arXiv:1706.05587
  • [7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [9] Chen WY, 2020, Arxiv, DOI arXiv:1912.10917
  • [10] Mobile-Former: Bridging MobileNet and Transformer
    Chen, Yinpeng
    Dai, Xiyang
    Chen, Dongdong
    Liu, Mengchen
    Dong, Xiaoyi
    Yuan, Lu
    Liu, Zicheng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5260 - 5269