MGSeg: Multiple Granularity-Based Real-Time Semantic Segmentation Network

被引:27
作者
He, Jun-Yan [1 ,2 ]
Liang, Shi-Hua [1 ,2 ]
Wu, Xiao [1 ,2 ]
Zhao, Bo [3 ]
Zhang, Lei [4 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Xipu Campus, Chengdu 611756, Peoples R China
[2] Natl Engn Lab Integrated Transportat Big Data App, Chengdu 611756, Peoples R China
[3] Bank Montreal, Toronto, ON M5X 1A1, Canada
[4] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Image segmentation; Real-time systems; Visualization; Task analysis; Noise measurement; Feature extraction; Semantic segmentation; real-time; multiple granularity;
D O I
10.1109/TIP.2021.3102509
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent works on semantic segmentation witness significant performance improvement by utilizing global contextual information. In this paper, an efficient multi-granularity based semantic segmentation network (MGSeg) is proposed for real-time semantic segmentation, by modeling the latent relevance between multi-scale geometric details and high-level semantics for fine granularity segmentation. In particular, a light-weight backbone ResNet-18 is first adopted to produce the hierarchical features. Hybrid Attention Feature Aggregation (HAFA) is designed to filter the noisy spatial details of features, acquire the scale-invariance representation, and alleviate the gradient vanishing problem of the early-stage feature learning. After aggregating the learned features, Fine Granularity Refinement (FGR) module is employed to explicitly model the relationship between the multi-level features and categories, generating proper weights for fusion. More importantly, to meet the real-time processing, a series of light-weight strategies and simplified structures are applied to accelerate the efficiency, including light-weight backbone, channel compression, narrow neck structure, and so on. Extensive experiments conducted on benchmark datasets Cityscapes and CamVid demonstrate that the proposed method achieves the state-of-the-art performance, 77.8%@50fps and 72.7%@127fps on Cityscapes and CamVid datasets, respectively, having the capability for real-time applications.
引用
收藏
页码:7200 / 7214
页数:15
相关论文
共 64 条
[1]  
[Anonymous], 2019, TPAMI
[2]  
[Anonymous], 2017, Computing Research Repository
[3]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[4]  
Bian X, 2016, 2016 ieee winter conference on applications of computer vision (wacv), P1
[5]   Semantic object classes in video: A high-definition ground truth database [J].
Brostow, Gabriel J. ;
Fauqueur, Julien ;
Cipolla, Roberto .
PATTERN RECOGNITION LETTERS, 2009, 30 (02) :88-97
[6]  
Byeon W, 2015, PROC CVPR IEEE, P3547, DOI 10.1109/CVPR.2015.7298977
[7]   HarDNet: A Low Memory Traffic Network [J].
Chao, Ping ;
Kao, Chao-Yang ;
Ruan, Yu-Shan ;
Huang, Chien-Hsiang ;
Lin, Youn-Long .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3551-3560
[8]  
Chen L., 2015, 2015 IEEE CUSTOM INT, P1
[9]   CaMap: Camera-based Map Manipulation on Mobile Devices [J].
Chen, Liang ;
Chen, Dongyi .
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
[10]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848