MDRNet: a lightweight network for real-time semantic segmentation in street scenes

被引:17
作者
Dai, Yingpeng [1 ]
Wang, Junzheng [1 ]
Li, Jiehao [1 ]
Li, Jing [1 ]
机构
[1] Beijing Inst Technol, State Key Lab Intelligent Control & Decis Complex, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Environmental perception; Lightweight neural network; Real-time application; Unmanned platform; ROBOT;
D O I
10.1108/AA-06-2021-0078
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose This paper aims to focus on the environmental perception of unmanned platform under complex street scenes. Unmanned platform has a strict requirement both on accuracy and inference speed. So how to make a trade-off between accuracy and inference speed during the extraction of environmental information becomes a challenge. Design/methodology/approach In this paper, a novel multi-scale depth-wise residual (MDR) module is proposed. This module makes full use of depth-wise separable convolution, dilated convolution and 1-dimensional (1-D) convolution, which is able to extract local information and contextual information jointly while keeping this module small-scale and shallow. Then, based on MDR module, a novel network named multi-scale depth-wise residual network (MDRNet) is designed for fast semantic segmentation. This network could extract multi-scale information and maintain feature maps with high spatial resolution to mitigate the existence of objects at multiple scales. Findings Experiments on Camvid data set and Cityscapes data set reveal that the proposed MDRNet produces competitive results both in terms of computational time and accuracy during inference. Specially, the authors got 67.47 and 68.7% Mean Intersection over Union (MIoU) on Camvid data set and Cityscapes data set, respectively, with only 0.84 million parameters and quicker speed on a single GTX 1070Ti card. Originality/value This research can provide the theoretical and engineering basis for environmental perception on the unmanned platform. In addition, it provides environmental information to support the subsequent works.
引用
收藏
页码:725 / 733
页数:9
相关论文
共 51 条
[1]  
[Anonymous], 2017, P IEEE C COMP VIS PA, DOI [DOI 10.48550/ARXIV.1705.09914, 10.1109/CVPR.2017.75]
[2]  
[Anonymous], 2017, ACM, DOI DOI 10.1145/3065386
[3]  
[Anonymous], INT C LEARN REPR
[4]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[5]  
[Anonymous], 2015, P IEEE C COMP VIS PA
[6]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[7]   Segmentation and Recognition Using Structure from Motion Point Clouds [J].
Brostow, Gabriel J. ;
Shotton, Jamie ;
Fauqueur, Julien ;
Cipolla, Roberto .
COMPUTER VISION - ECCV 2008, PT I, PROCEEDINGS, 2008, 5302 :44-+
[8]   Semantic object classes in video: A high-definition ground truth database [J].
Brostow, Gabriel J. ;
Fauqueur, Julien ;
Cipolla, Roberto .
PATTERN RECOGNITION LETTERS, 2009, 30 (02) :88-97
[9]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[10]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848