SEDNet: Real-Time Semantic Segmentation Algorithm Based on STDC

被引:1
作者
Ma, Sugang [1 ,2 ]
Zhao, Ziyi [1 ]
Hou, Zhiqiang [1 ]
Yu, Wangsheng [3 ]
Yang, Xiaobao [1 ]
Zhao, Xiangmo [2 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian, Peoples R China
[2] Changan Univ, Sch Informat Engn, Xian, Peoples R China
[3] Air Force Engn Univ, Sch Informat & Nav, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
deep convolutional neural network; encoder-decoder; real-time semantic segmentation; STDC; NETWORK;
D O I
10.1155/int/8243407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, deep convolutional neural networks (DCNN) have been widely used in semantic segmentation tasks and have achieved high segmentation accuracy. However, most algorithms based on DCNN have high computational complexity, making them unsuitable for real-time segmentation. To solve this problem, this paper proposes a real-time semantic segmentation algorithm based on the STDC network. The algorithm adopts an "encoder-decoder" embedded in a U-shaped architecture to realize real-time segmentation while maintaining high accuracy. Following the encoder, a mixed pooling attention module is designed to expand the receptive field, enhancing the network model's learning ability in complex scenarios. Then, a feature fusion module is used for combining features from different stages, and channel attention based on atrous convolution is employed to expand the receptive field and avoid dimensionality reduction learning. Finally, a Tversky-based detail loss function is used to encode more spatial details. The proposed algorithm was extensively tested on the challenging Cityscapes and CamVid datasets, and the experimental results showed that the proposed algorithm obtained 76.4% and 72.8% of mIoU, respectively. Meanwhile, our algorithm achieves 105.2 FPS and 165.6 FPS inference speed with a single NVIDIA GTX 1080Ti GPU, meeting the real-time segmentation requirements. The proposed algorithm can conduct real-time segmentation while maintaining high accuracy, achieving a good balance between accuracy and speed.
引用
收藏
页数:15
相关论文
共 41 条
[1]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[2]   Segmentation and Recognition Using Structure from Motion Point Clouds [J].
Brostow, Gabriel J. ;
Shotton, Jamie ;
Fauqueur, Julien ;
Cipolla, Roberto .
COMPUTER VISION - ECCV 2008, PT I, PROCEEDINGS, 2008, 5302 :44-+
[3]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[4]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[5]   Refinement Co-supervision network for real-time semantic segmentation [J].
Dong, Yongsheng ;
Zhao, Kaiyuan ;
Zheng, Lintao ;
Yang, Haotian ;
Liu, Qing ;
Pei, Yuanhua .
IET COMPUTER VISION, 2023, 17 (06) :652-662
[6]   MLFNet: Multi-Level Fusion Network for Real-Time Semantic Segmentation of Autonomous Driving [J].
Fan, Jiaqi ;
Wang, Fei ;
Chu, Hongqing ;
Hu, Xiao ;
Cheng, Yifan ;
Gao, Bingzhao .
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (01) :756-767
[7]   Rethinking BiSeNet For Real-time Semantic Segmentation [J].
Fan, Mingyuan ;
Lai, Shenqi ;
Huang, Junshi ;
Wei, Xiaoming ;
Chai, Zhenhua ;
Luo, Junfeng ;
Wei, Xiaolin .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9711-9720
[8]  
Fu L, 2022, arXiv
[9]  
Hong Y., 2024, IEEE Transactions on Intelligent Vehicles, P1, DOI [10.1109/TIV.2024.3380066, DOI 10.1109/TIV.2024.3380066]
[10]   Strip Pooling: Rethinking Spatial Pooling for Scene Parsing [J].
Hou, Qibin ;
Zhang, Li ;
Cheng, Ming-Ming ;
Feng, Jiashi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4002-4011