Feature pyramid network with multi-scale prediction fusion for real- time semantic segmentation

被引:15
作者
Quyen, Toan Van [1 ]
Kim, Min Young [1 ,2 ]
机构
[1] Kyungpook Natl Univ, IT Coll, Sch Elect & Elect Engn, 1370 Sankyuk dong, Daegu 702701, South Korea
[2] Kyungpook Natl Univ, IT Coll, Res Ctr Neurosurg Robot Syst, 1370 Sankyuk dong, Daegu 702701, South Korea
基金
新加坡国家研究基金会;
关键词
Semantic segmentation; Feature pyramid network; Attention mechanism; Multi-scale fusion; Real time;
D O I
10.1016/j.neucom.2022.11.062
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature pyramid network (FPN) is constructed from a bottom-up pathway and a top-down pathway. The method involves multi-scale features, so it can obtain rich contextual information from lower scales and high resolution from the largest scale. Additionally, different receptive fields are effective to capture both thin and large objects in image scenes. All feature maps concatenate together to predict the targets. However, the average pooling method yields the problem of combining the best predictions with poorer ones. In this paper, we proposed a dual prediction to leverage the useful characteristics of each FPN fea-ture map. A low scale prediction attains good precision for large objects. The other one suitably segments narrow objects. Finally, a multi-scale fusion is deployed with an attention part. The attention module finds pixels of a low scale having high probabilities of wrong labels, and then requires the supplements from a high scale. A multi-scale fusion allows the network to learn across the different scales of predic-tions. We have achieved good Results 77.9% mIoU at 62 FPS on Cityscapes and 44.1% mIoU on Mapillary Vistas. CO 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:104 / 113
页数:10
相关论文
共 46 条
[1]  
Arani Elahe, 2021, P IEEE CVF WINT C AP, V3009
[2]  
Chen L. -C., 2014, Semantic image segmentation with deep convolutional nets and fully connected crfs
[3]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[4]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[5]  
Cong DC, 2019, INT CONF ACOUST SPEE, P1892, DOI [10.1109/ICASSP.2019.8683673, 10.1109/icassp.2019.8683673]
[6]   DSANet: Dilated spatial attention for real-time semantic segmentation in urban street scenes [J].
Elhassan, Mohammed A. M. ;
Huang, Chenxi ;
Yang, Chenhui ;
Munea, Tewodros Legesse .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183
[7]  
Fan J., 2021, P 2021 IEEE INT C AU, P1, DOI [10.1109/ICAS49788.2021.9551165, DOI 10.1109/ICAS49788.2021.9551165]
[8]   Adaptive Context Network for Scene Parsing [J].
Fu, Jun ;
Liu, Jing ;
Wang, Yuhang ;
Li, Yong ;
Bao, Yongjun ;
Tang, Jinhui ;
Lu, Hanqing .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6747-6756
[9]  
Gamal M., 2018, arXiv
[10]   MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for Real-Time Semantic Segmentation [J].
Gao, Guangwei ;
Xu, Guoan ;
Yu, Yi ;
Xie, Jin ;
Yang, Jian ;
Yue, Dong .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) :25489-25499