Denoising Pretraining for Semantic Segmentation

被引:76
作者
Brempong, Emmanuel Asiedu [1 ,2 ]
Kornblith, Simon [1 ]
Chen, Ting [1 ]
Parmar, Niki [1 ]
Minderer, Matthias [1 ]
Norouzi, Mohammad [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Google AI Residency, Mountain View, CA USA
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022 | 2022年
关键词
D O I
10.1109/CVPRW56347.2022.00462
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Semantic segmentation labels are expensive and time consuming to acquire. To improve label efficiency of semantic segmentation models, we revisit denoising autoencoders and study the use of a denoising objective for pretraining UNets. We pretrain a Transformer-based UNet as a denoising autoencoder, followed by fine-tuning on semantic segmentation using few labeled examples. Denoising pretraining outperforms training from random initialization, and even supervised ImageNet-21K pretraining of the encoder when the number of labeled images is small. A key advantage of denoising pretraining over supervised pretraining of the backbone is the ability to pretrain the decoder, which would otherwise be randomly initialized. We thus propose a novel Decoder Denoising Pretraining (DDeP) method, in which we initialize the encoder using supervised learning and pretrain only the decoder using the denoising objective. Despite its simplicity, DDeP achieves state-of-the-art results on label-efficient semantic segmentation, offering considerable gains on the Cityscapes, Pascal Context, and ADE20K datasets.
引用
收藏
页码:4174 / 4185
页数:12
相关论文
共 97 条
[1]   LEARNING WITH A PROBABILISTIC TEACHER [J].
AGRAWALA, AK .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1970, 16 (04) :373-+
[2]  
[Anonymous], 2021, ARXIV210212092
[3]  
[Anonymous], 2022, 10 INT C LEARN UNPUB
[4]  
[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01001
[5]  
[Anonymous], 2020, ARXIV200611239
[6]  
[Anonymous], 2020, ARXIV200607733
[7]  
[Anonymous], 2020, INT C MACH LEARN
[8]  
Bachman P, 2019, ADV NEUR IN, V32
[9]  
Bao Hangbo, 2021, PROC INT C LEARN REP
[10]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229