Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer

被引:118
作者
Lu, Zhihe [1 ,2 ]
He, Sen [1 ,2 ]
Zhu, Xiatian [1 ]
Zhang, Li [3 ]
Song, Yi-Zhe [1 ,2 ]
Xiang, Tao [1 ,2 ]
机构
[1] Univ Surrey, CVSSP, Guildford, Surrey, England
[2] iFlyTek Surrey Joint Res Ctr Artificial Intellige, Guildford, Surrey, England
[3] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
关键词
D O I
10.1109/ICCV48922.2021.00862
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A few-shot semantic segmentation model is typically composed of a CNN encoder, a CNN decoder and a simple classifier (separating foreground and background pixels). Most existing methods meta-learn all three model components for fast adaptation to a new class. However, given that as few as a single support set image is available, effective model adaption of all three components to the new class is extremely challenging. In this work we propose to simplify the meta-learning task by focusing solely on the simplest component - the classifier, whilst leaving the encoder and decoder to pre-training. We hypothesize that if we pre-train an off-the-shelf segmentation model over a set of diverse training classes with sufficient annotations, the encoder and decoder can capture rich discriminative features applicable for any unseen classes, rendering the subsequent meta-learning stage unnecessary. For the classifier meta-learning, we introduce a Classifier Weight Transformer (CWT) designed to dynamically adapt the support-set trained classifier's weights to each query image in an inductive way. Extensive experiments on two standard benchmarks show that despite its simplicity, our method outperforms the state-of-the-art alternatives, often by a large margin.
引用
收藏
页码:8721 / 8730
页数:10
相关论文
共 41 条
[1]  
[Anonymous], 1987, Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-. hook
[2]  
[Anonymous], 2020, ICML
[3]  
[Anonymous], 2018, CVPR, DOI DOI 10.1109/CVPR.2018.00459
[4]  
Azad Reza, 2021, WACV
[5]  
Boyu Yang, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12353), P763, DOI 10.1007/978-3-030-58598-3_45
[6]  
Cao Xianbin, 2020, ECCV
[7]  
Carion N., 2020, ARXIV200512872
[8]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[9]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[10]   Effects of the flipped classroom instructional strategy on students' learning outcomes: a meta-analysis [J].
Cheng, Li ;
Ritzhaupt, Albert D. ;
Antonenko, Pavlo .
ETR&D-EDUCATIONAL TECHNOLOGY RESEARCH AND DEVELOPMENT, 2019, 67 (04) :793-824