Contextual Guidance Network for Real-Time Semantic Segmentation of Autonomous Driving

被引:0
作者
Li, Wei [1 ,2 ]
Liao, Muxin [3 ]
Hua, Guoguang [1 ,2 ]
Zhang, Yuhang [4 ]
Zou, Wenbin [1 ,2 ]
机构
[1] Shenzhen Univ, Guangdong Key Lab Intelligent Informat Proc, Shenzhen Key Lab Adv Machine Learning & Applicat, Shenzhen 518060, Peoples R China
[2] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen 518060, Peoples R China
[3] Jiangxi Agr Univ, Sch Comp Sci & Engn, Nanchang 330045, Peoples R China
[4] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolution; Computational modeling; Real-time systems; Computational efficiency; Accuracy; Feature extraction; Semantic segmentation; Autonomous vehicles; Computer architecture; Semantics; Real-time semantic segmentation; lightweight network; contextual information; feature fusion; FUSION NETWORK;
D O I
10.1109/TITS.2025.3568841
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
With the rise of mobile computing and the increasing demand for real-time applications, the need for efficient and accurate semantic segmentation models has become paramount. However, existing state-of-the-art models are often hindered by heavy computational requirements, rendering them impractical for real-time applications. To tackle this challenge, we introduce the Contextual Guidance Network (CGNet), an efficient and lightweight network designed specifically for real-time semantic segmentation in autonomous driving. CGNet primarily consists of two key components: the Contextual Guidance Module (CGM) and the Triple-Branch Residual Fusion Module (TRFM). The CGM is comprised of the Downsampling Refine Unit (DRU) and the Contextual Guidance Bottleneck (CGB), which are utilized to gather dense contextual information. The DRU functions as a downsampling tool to generate low-resolution images, while the CGB extracts rich contextual information from both spatial and channel dimensions. Additionally, the TRFM utilizes the Residual Fusion Module (RFM) to achieve feature fusion and enhance pixel prediction accuracy. Without bells and whistles, CGNet achieves impressive mean intersection over union (mIoU) scores of 77.11% with 1.00 million parameters at 86.71 frames per second (fps) on the Cityscapes dataset, 72.26% mIoU at 88.62 fps on the CamVid dataset, and 63.32% mIoU on the BDD100K dataset. Extensive experiments demonstrate that CGNet achieves a favorable tradeoff between segmentation accuracy, inference speed and computational cost, making it suitable for autonomous driving systems with limited hardware resources. The source code will be available on GitHub: https://github.com/lv881314/CGNet
引用
收藏
页数:16
相关论文
共 65 条
[1]   Semantic object classes in video: A high-definition ground truth database [J].
Brostow, Gabriel J. ;
Fauqueur, Julien ;
Cipolla, Roberto .
PATTERN RECOGNITION LETTERS, 2009, 30 (02) :88-97
[2]   Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks [J].
Chen, Jierun ;
Kao, Shiu-Hong ;
He, Hao ;
Zhuo, Weipeng ;
Wen, Song ;
Lee, Chul-Ho ;
Chan, S. -H. Gary .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :12021-12031
[3]   RAFNet: Reparameterizable Across-Resolution Fusion Network for Real-Time Image Semantic Segmentation [J].
Chen, Lei ;
Dai, Huhe ;
Zheng, Yuan .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) :1212-1227
[4]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[5]   Dual-domain strip attention for image restoration [J].
Cui, Yuning ;
Knoll, Alois .
NEURAL NETWORKS, 2024, 171 :429-439
[6]   DETERMINING OBJECTIVE WEIGHTS IN MULTIPLE CRITERIA PROBLEMS - THE CRITIC METHOD [J].
DIAKOULAKI, D ;
MAVROTAS, G ;
PAPAYANNAKIS, L .
COMPUTERS & OPERATIONS RESEARCH, 1995, 22 (07) :763-770
[7]  
Dong B, 2023, AAAI CONF ARTIF INTE, P516
[8]   DSANet: Dilated spatial attention for real-time semantic segmentation in urban street scenes [J].
Elhassan, Mohammed A. M. ;
Huang, Chenxi ;
Yang, Chenhui ;
Munea, Tewodros Legesse .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183
[9]   SegTransConv: Transformer and CNN Hybrid Method for Real-Time Semantic Segmentation of Autonomous Vehicles [J].
Fan, Jiaqi ;
Gao, Bingzhao ;
Ge, Quanbo ;
Ran, Yabing ;
Zhang, Jia ;
Chu, Hongqing .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (02) :1586-1601
[10]   MLFNet: Multi-Level Fusion Network for Real-Time Semantic Segmentation of Autonomous Driving [J].
Fan, Jiaqi ;
Wang, Fei ;
Chu, Hongqing ;
Hu, Xiao ;
Cheng, Yifan ;
Gao, Bingzhao .
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (01) :756-767