Encoder-decoder with double spatial pyramid for semantic segmentation

被引:1
作者
Kong, Huifang [1 ]
Hu, Jie [1 ]
Fan, Lei [1 ]
Zhang, Xiaoxue [1 ]
Fang, Yao [1 ]
机构
[1] Hefei Univ Technol, Sch Elect Engn & Automat, Hefei, Anhui, Peoples R China
关键词
semantic segmentation; encoder-decoder; spatial pyramid; attention mechanism; NEURAL-NETWORK;
D O I
10.1117/1.JEI.28.6.063007
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Semantic segmentation, as a dense pixelwise classification task, is of great significance to scene understanding. Many approaches based on convolutional neural network still suffer from two kinds of challenges: (1) insufficient semantic information results in semantic obfuscation between similar categories, (2) loss of spatial information leads to inaccurate location of inconspicuous objects. To tackle these challenges, we design a network with an encoder-decoder architecture based on two proposed modules: global pyramid attention module (GPAM) and pyramid decoder module (PDM). Specifically, GPAM exploits an attention mechanism as global prior knowledge to adaptively capture discriminative features for enhancing semantic representation, and PDM employs small convolutions connected in parallel to predict adjacent position relationships for refining spatial information. A series of ablation experiments are conducted to demonstrate the effectiveness of our designs, and our network achieves a mean intersection over union score of 83.4% on PASCAL VOC 2012 dataset and 78.5% on Cityscapes dataset. (C) 2019 SPIE and IS&T
引用
收藏
页数:10
相关论文
共 58 条
[21]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[22]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[23]   Learning Hierarchical Features for Scene Labeling [J].
Farabet, Clement ;
Couprie, Camille ;
Najman, Laurent ;
LeCun, Yann .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1915-1929
[24]  
FU J, 2018, ARXIV180902983
[25]  
Guo YY, 2018, J ELECTRON IMAGING, V27, DOI 10.1117/1.JEI.27.6.063007
[26]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[27]   Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (09) :1904-1916
[28]   Reformulating Level Sets as Deep Recurrent Neural Network Approach to Semantic Segmentation [J].
Hoang Ngan Le, T. ;
Kha Gia Quach ;
Khoa Luu ;
Chi Nhan Duong ;
Savvides, Marios .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (05) :2393-2407
[29]  
Hu J., 2017, CORR
[30]  
Huang Lichao, 2018, COMPUTER VISION PATT