MGSeg: Multiple Granularity-Based Real-Time Semantic Segmentation Network

被引：27

作者：

He, Jun-Yan ^{[1
,2
]}

Liang, Shi-Hua ^{[1
,2
]}

Wu, Xiao ^{[1
,2
]}

Zhao, Bo ^{[3
]}

Zhang, Lei ^{[4
]}

机构：

[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Xipu Campus, Chengdu 611756, Peoples R China

[2] Natl Engn Lab Integrated Transportat Big Data App, Chengdu 611756, Peoples R China

[3] Bank Montreal, Toronto, ON M5X 1A1, Canada

[4] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Semantics; Image segmentation; Real-time systems; Visualization; Task analysis; Noise measurement; Feature extraction; Semantic segmentation; real-time; multiple granularity;

D O I：

10.1109/TIP.2021.3102509

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent works on semantic segmentation witness significant performance improvement by utilizing global contextual information. In this paper, an efficient multi-granularity based semantic segmentation network (MGSeg) is proposed for real-time semantic segmentation, by modeling the latent relevance between multi-scale geometric details and high-level semantics for fine granularity segmentation. In particular, a light-weight backbone ResNet-18 is first adopted to produce the hierarchical features. Hybrid Attention Feature Aggregation (HAFA) is designed to filter the noisy spatial details of features, acquire the scale-invariance representation, and alleviate the gradient vanishing problem of the early-stage feature learning. After aggregating the learned features, Fine Granularity Refinement (FGR) module is employed to explicitly model the relationship between the multi-level features and categories, generating proper weights for fusion. More importantly, to meet the real-time processing, a series of light-weight strategies and simplified structures are applied to accelerate the efficiency, including light-weight backbone, channel compression, narrow neck structure, and so on. Extensive experiments conducted on benchmark datasets Cityscapes and CamVid demonstrate that the proposed method achieves the state-of-the-art performance, 77.8%@50fps and 72.7%@127fps on Cityscapes and CamVid datasets, respectively, having the capability for real-time applications.

引用

页码：7200 / 7214

页数：15

共 64 条

[1]

[Anonymous], 2019, TPAMI

[2]

[Anonymous], 2017, Computing Research Repository

[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[4]

Bian X, 2016, 2016 ieee winter conference on applications of computer vision (wacv), P1

[5] Semantic object classes in video: A high-definition ground truth database [J].

Brostow, Gabriel J. ;

Fauqueur, Julien ;

Cipolla, Roberto .

PATTERN RECOGNITION LETTERS, 2009, 30 (02) :88-97

[6]

Byeon W, 2015, PROC CVPR IEEE, P3547, DOI 10.1109/CVPR.2015.7298977

[7] HarDNet: A Low Memory Traffic Network [J].

Chao, Ping ;

Kao, Chao-Yang ;

Ruan, Yu-Shan ;

Huang, Chien-Hsiang ;

Lin, Youn-Long .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3551-3560

[8]

Chen L., 2015, 2015 IEEE CUSTOM INT, P1

[9] CaMap: Camera-based Map Manipulation on Mobile Devices [J].

Chen, Liang ;

Chen, Dongyi .

PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,

[10] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

← 1 2 3 4 5 6 7 →