Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

被引：213

作者：

Tian, Zhi ^{[1
]}

He, Tong ^{[1
]}

Shen, Chunhua ^{[1
]}

Yan, Youliang ^{[2
]}

机构：

[1] Univ Adelaide, Adelaide, SA, Australia

[2] Huawei Technol, Noahs Ark Lab, Shenzhen, Guangdong, Peoples R China

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

关键词：

D O I：

10.1109/CVPR.2019.00324

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent semantic segmentation methods exploit encoder-decoder architectures to produce the desired pixel-wise segmentation prediction. The last layer of the decoders is typically a bilinear upsampling procedure to recover the final pixel-wise prediction. We empirically show that this oversimple and data-independent bilinear upsampling may lead to sub-optimal results. In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs. The main advantage of the new upsampling layer lies in that with a relatively lower-resolution feature map such as 1/16 or 1/32 of the input size, we can achieve even better segmentation accuracy, significantly reducing computation complexity. This is made possible by 1) the new upsampling layer's much improved reconstruction capability; and more importantly 2) the DUpsampling based decoder's flexibility in leveraging almost arbitrary combinations of the CNN encoders' features. Experiments on PASCAL VOC demonstrate that with much less computation complexity, our decoder outperforms the state-of-the-art decoder. Finally, without any post-processing, the framework equipped with our proposed decoder achieves new state-of-the-art performance on two datasets: 88.1% mIOU on PASCAL VOC with 30% computation of the previously best model; and 52.5% mIOU on PASCAL Context.

引用

页码：3121 / 3130

页数：10

共 36 条

[1]

[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.348

[2]

[Anonymous], 2017, ARXIV170804943

[3]

[Anonymous], 2016, ARXIV

[4] Higher Order Conditional Random Fields in Deep Neural Networks [J].

Arnab, Anurag ;

Jayasumana, Sadeep ;

Zheng, Shuai ;

Torr, Philip H. S. .

COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :524-540

[5] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[6]

Berg A.C., 2015, ARXIV150604579

[7]

Chen L.C., 2018, P EUR C COMP VIS

[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[9]

Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709

[10] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

← 1 2 3 4 →