LightSeg: Local Spatial Perception Convolution for Real-Time Semantic Segmentation

被引：1

作者：

Lei, Xiaochun ^{[1
,2
]}

Liang, Jiaming ^{[1
]}

Gong, Zhaoting ^{[1
]}

Jiang, Zetao ^{[1
,2
]}

机构：

[1] Guilin Univ Elect Technol, Sch Comp Sci & Informat Secur, Guilin 541004, Peoples R China

[2] Guilin Univ Elect Technol, Guangxi Key Lab Image & G Intelligent Proc, Guilin 541004, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 14期

基金：

中国国家自然科学基金;

关键词：

semantic segmentation; low-power devices; real-time inference; efficient deep learning; NETWORK;

D O I：

10.3390/app13148130

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Semantic segmentation is increasingly being applied on mobile devices due to advancements in mobile chipsets, particularly in low-power consumption scenarios. However, the lightweight design of mobile devices poses limitations on the receptive field, which is crucial for dense prediction problems. Existing approaches have attempted to balance lightweight designs and high accuracy by downsampling features in the backbone. However, this downsampling may result in the loss of local details at each network stage. To address this challenge, this paper presents a novel solution in the form of a compact and efficient convolutional neural network (CNN) for real-time applications: our proposed model, local spatial perception convolution (LSPConv). Furthermore, the effectiveness of our architecture is demonstrated on the Cityscapes dataset. The results show that our model achieves an impressive balance between accuracy and inference speed. Specifically, our LightSeg, which does not rely on ImageNet pretraining, achieves an mIoU of 76.1 at a speed of 61 FPS on the Cityscapes validation set, utilizing an RTX 2080Ti GPU with mixed precision. Additionally, it achieves a speed of 115.7 FPS on the Jetson NX with int8 precision.

引用

页数：15

共 41 条

[1] Segmentation and Recognition Using Structure from Motion Point Clouds
Brostow, Gabriel J.
Shotton, Jamie
Fauqueur, Julien
Cipolla, Roberto
[J]. COMPUTER VISION - ECCV 2008, PT I, PROCEEDINGS, 2008, 5302 : 44 - +
[2] Semantic object classes in video: A high-definition ground truth database
Brostow, Gabriel J.
Fauqueur, Julien
Cipolla, Roberto
[J]. PATTERN RECOGNITION LETTERS, 2009, 30 (02) : 88 - 97
[3] Deep Spatio-Temporal Random Fields for Efficient Video Segmentation
Chandra, Siddhartha
Couprie, Camille
Kokkinos, Iasonas
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8915 - 8924
[4] Chen J, 2021, arXiv
[5] Chen LC, 2016, Arxiv, DOI [arXiv:1412.7062, 10.1080/17476938708814211]
[6] Chen LC, 2017, Arxiv, DOI arXiv:1706.05587
[7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Chen, Liang-Chieh
Zhu, Yukun
Papandreou, George
Schroff, Florian
Adam, Hartwig
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[9] Chen WY, 2020, Arxiv, DOI arXiv:1912.10917
[10] Mobile-Former: Bridging MobileNet and Transformer
Chen, Yinpeng
Dai, Xiyang
Chen, Dongdong
Liu, Mengchen
Dong, Xiaoyi
Yuan, Lu
Liu, Zicheng
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5260 - 5269

← 1 2 3 4 5 →