Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes

被引:2
作者
Ye, Xin [1 ]
Gao, Lang [1 ]
Chen, Jichen [2 ]
Lei, Mingyue [1 ]
机构
[1] Xian Technol Univ, Inst Artificial Intelligence & Data Sci, Xian, Peoples R China
[2] Xian Microelect Technol Inst, Comp Part 3, Xian, Peoples R China
关键词
computer vision; semantic segmentation; channel attention mechanism; residual block; dilation convolution; factorized convolution;
D O I
10.3389/fnbot.2023.1204418
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic segmentation, which is a fundamental task in computer vision. Every pixel will have a specific semantic class assigned to it through semantic segmentation methods. Embedded systems and mobile devices are difficult to deploy high-accuracy segmentation algorithms. Despite the rapid development of semantic segmentation, the balance between speed and accuracy must be improved. As a solution to the above problems, we created a cross-scale fusion attention mechanism network called CFANet, which fuses feature maps from different scales. We first design a novel efficient residual module (ERM), which applies both dilation convolution and factorized convolution. Our CFANet is mainly constructed from ERM. Subsequently, we designed a new multi-branch channel attention mechanism (MCAM) to refine the feature maps at different levels. Experiment results show that CFANet achieved 70.6% mean intersection over union (mIoU) and 67.7% mIoU on Cityscapes and CamVid datasets, respectively, with inference speeds of 118 FPS and 105 FPS on NVIDIA RTX2080Ti GPU cards with 0.84M parameters.
引用
收藏
页数:12
相关论文
共 34 条
[1]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[2]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[3]   Semantic object classes in video: A high-definition ground truth database [J].
Brostow, Gabriel J. ;
Fauqueur, Julien ;
Cipolla, Roberto .
PATTERN RECOGNITION LETTERS, 2009, 30 (02) :88-97
[4]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[5]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[6]   MDRNet: a lightweight network for real-time semantic segmentation in street scenes [J].
Dai, Yingpeng ;
Wang, Junzheng ;
Li, Jiehao ;
Li, Jing .
ASSEMBLY AUTOMATION, 2021, 41 (06) :725-733
[7]   EdgeNet: Semantic Scene Completion from a Single RGB-D Image [J].
Dourado, Aloisio ;
De Campos, Teofilo E. ;
Kim, Hansung ;
Hilton, Adrian .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :503-510
[8]   SANet: Structure-Aware Network for Visual Tracking [J].
Fan, Heng ;
Ling, Haibin .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :2217-2224
[9]   MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for Real-Time Semantic Segmentation [J].
Gao, Guangwei ;
Xu, Guoan ;
Yu, Yi ;
Xie, Jin ;
Yang, Jian ;
Yue, Dong .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) :25489-25499
[10]  
Han W., 2020, arXiv, DOI [10.21437/Interspeech.2020-2059, DOI 10.21437/INTERSPEECH.2020-2059]