DRMNet: more efficient bilateral networks for real-time semantic segmentation of road scenes

被引:2
作者
Zhang, Wenming [1 ]
Zhang, Shaotong [1 ]
Li, Yaqian [1 ]
Li, Haibin [1 ]
Song, Tao [2 ]
机构
[1] Yanshan Univ, Key Lab Ind Comp Control Engn Hebei Prov, Qinhuangdao 066004, Peoples R China
[2] Yanshan Univ, Sch Elect Engn, Hebei Prov Key Lab Test Measurement Technol & Inst, Qinhuangdao 066004, Peoples R China
基金
中国国家自然科学基金;
关键词
Real-time; Lightweight network; Semantic segmentation; Feature fusion; Attention mechanism;
D O I
10.1007/s11554-024-01579-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic segmentation is crucial in autonomous driving because of its accurate identification and segmentation of objects and regions. However, there is a conflict between segmentation accuracy and real-time performance on embedded devices. We propose an efficient lightweight semantic segmentation network (DRMNet) to solve these problems. Employing a streamlined bilateral structure, the model encodes semantic and spatial paths, cross-fusing features during encoding, and incorporates unique skip connections to coordinate upsampling within the semantic pathway. We design a new self-calibrated aggregate pyramid pooling module (SAPPM) at the end of the semantic branch to capture more comprehensive multi-scale semantic information and balance its extraction and inference speed. Furthermore, we designed a new feature fusion module, which guides the fusion of detail features and semantic features through attention perception, alleviating the problem of semantic information quickly covering spatial detail information. Experimental results on the CityScapes, CamVid, and NightCity datasets demonstrate the effectiveness of DRMNet. On a 2080Ti GPU, DRMNet achieves 78.6% mIoU at 88.3 FPS on the CityScapes dataset, 78.9% mIoU at 149 FPS on the CamVid dataset, and 53.5% mIoU at 160.4 FPS on the NightCity dataset. These results highlight the model's ability to balance accuracy and real-time performance better, making it suitable for embedded devices in autonomous driving applications.
引用
收藏
页数:13
相关论文
共 58 条
[1]   Road Traffic Sign Recognition Algorithm Based on Cascade Attention-Modulation Fusion Mechanism [J].
An, Fengping ;
Wang, Jianrong ;
Liu, Ruijun .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) :17841-17851
[2]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[3]   Semantic object classes in video: A high-definition ground truth database [J].
Brostow, Gabriel J. ;
Fauqueur, Julien ;
Cipolla, Roberto .
PATTERN RECOGNITION LETTERS, 2009, 30 (02) :88-97
[4]   GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].
Cao, Yue ;
Xu, Jiarui ;
Lin, Stephen ;
Wei, Fangyun ;
Hu, Han .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980
[5]   A Review of Vision-Based Traffic Semantic Understanding in ITSs [J].
Chen, Jing ;
Wang, Qichao ;
Cheng, Harry H. ;
Peng, Weiming ;
Xu, Wenqiang .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (11) :19954-19979
[6]   CaMap: Camera-based Map Manipulation on Mobile Devices [J].
Chen, Liang ;
Chen, Dongyi .
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
[7]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[8]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[9]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[10]  
Dumoulin V, 2018, Arxiv, DOI [arXiv:1603.07285, 10.48550/arXiv.1603.07285]