Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer

被引：118

作者：

Lu, Zhihe ^{[1
,2
]}

He, Sen ^{[1
,2
]}

Zhu, Xiatian ^{[1
]}

Zhang, Li ^{[3
]}

Song, Yi-Zhe ^{[1
,2
]}

Xiang, Tao ^{[1
,2
]}

机构：

[1] Univ Surrey, CVSSP, Guildford, Surrey, England

[2] iFlyTek Surrey Joint Res Ctr Artificial Intellige, Guildford, Surrey, England

[3] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

关键词：

D O I：

10.1109/ICCV48922.2021.00862

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A few-shot semantic segmentation model is typically composed of a CNN encoder, a CNN decoder and a simple classifier (separating foreground and background pixels). Most existing methods meta-learn all three model components for fast adaptation to a new class. However, given that as few as a single support set image is available, effective model adaption of all three components to the new class is extremely challenging. In this work we propose to simplify the meta-learning task by focusing solely on the simplest component - the classifier, whilst leaving the encoder and decoder to pre-training. We hypothesize that if we pre-train an off-the-shelf segmentation model over a set of diverse training classes with sufficient annotations, the encoder and decoder can capture rich discriminative features applicable for any unseen classes, rendering the subsequent meta-learning stage unnecessary. For the classifier meta-learning, we introduce a Classifier Weight Transformer (CWT) designed to dynamically adapt the support-set trained classifier's weights to each query image in an inductive way. Extensive experiments on two standard benchmarks show that despite its simplicity, our method outperforms the state-of-the-art alternatives, often by a large margin.

引用

页码：8721 / 8730

页数：10

共 41 条

[1]

[Anonymous], 1987, Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-. hook

[2]

[Anonymous], 2020, ICML

[3]

[Anonymous], 2018, CVPR, DOI DOI 10.1109/CVPR.2018.00459

[4]

Azad Reza, 2021, WACV

[5]

Boyu Yang, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12353), P763, DOI 10.1007/978-3-030-58598-3_45

[6]

Cao Xianbin, 2020, ECCV

[7]

Carion N., 2020, ARXIV200512872

[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[9]

Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709

[10] Effects of the flipped classroom instructional strategy on students' learning outcomes: a meta-analysis [J].

Cheng, Li ;

Ritzhaupt, Albert D. ;

Antonenko, Pavlo .

ETR&D-EDUCATIONAL TECHNOLOGY RESEARCH AND DEVELOPMENT, 2019, 67 (04) :793-824

← 1 2 3 4 5 →