Pyramid Scene Parsing Network

被引:10639
作者
Zhao, Hengshuang [1 ]
Shi, Jianping [2 ]
Qi, Xiaojuan [1 ]
Wang, Xiaogang [1 ]
Jia, Jiaya [1 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
[2] SenseTime Grp Ltd, Beijing, Peoples R China
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
关键词
D O I
10.1109/CVPR.2017.660
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.
引用
收藏
页码:6230 / 6239
页数:10
相关论文
共 43 条
[1]  
[Anonymous], 2015, AISTATS
[2]  
[Anonymous], 2015, CVPR
[3]  
[Anonymous], 2014, ACM MM
[4]  
[Anonymous], 2015, ARXIV151100561
[5]  
[Anonymous], 2016, COMPUTER VISION PATT
[6]  
[Anonymous], 2016, ECCV
[7]  
[Anonymous], 2016, ARXIV160600915
[8]  
[Anonymous], 2015, ICCV
[9]  
[Anonymous], 2016, ECCV
[10]  
[Anonymous], 2014, ARXIV14127062