Semantic Prompt for Few-Shot Image Recognition

被引：32

作者：

Chen, Wentao ^{[1
,2
]}

Si, Chenyang ^{[3
]}

Zhang, Zhang ^{[2
,4
]}

Wang, Liang ^{[2
,4
]}

Wang, Zilei ^{[1
]}

Tan, Tieniu ^{[1
,2
,4
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

[2] CASIA, NLPR, Ctr Res Intelligent Percept & Comp, Hangzhou, Peoples R China

[3] Nanyang Technol Univ, Singapore, Singapore

[4] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

DATA-EFFICIENT;

D O I：

10.1109/CVPR52729.2023.02258

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Few-shot learning is a challenging problem since only a few examples are provided to recognize a new class. Several recent studies exploit additional semantic information, e.g. text embeddings of class names, to address the issue of rare samples through combining semantic prototypes with visual prototypes. However, these methods still suffer from the spurious visual features learned from the rare support samples, resulting in limited benefits. In this paper, we propose a novel Semantic Prompt (SP) approach for few-shot learning. Instead of the naive exploitation of semantic information for remedying classifiers, we explore leveraging semantic information as prompts to tune the visual feature extraction network adaptively. Specifically, we design two complementary mechanisms to insert semantic prompts into the feature extractor: one is to enable the interaction between semantic prompts and patch embeddings along the spatial dimension via self-attention, another is to supplement visual features with the transformed semantic prompts along the channel dimension. By combining these two mechanisms, the feature extractor presents a better ability to attend to the class-specific features and obtains more generalized image representations with merely a few support samples. Through extensive experiments on four datasets, the proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.

引用

页码：23581 / 23591

页数：11

共 59 条

[1] Matching Feature Sets for Few-Shot Image Classification [J].

Afrasiyabi, Arman ;

Larochelle, Hugo ;

Lalonde, Jean-Francois ;

Gagne, Christian .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9004-9014

[2]

Afrasiyabi A, 2020, Img Proc Comp Vis Re, V12350, P18, DOI 10.1007/978-3-030-58558-7_2

[3]

[Anonymous], 2009, CIFAR-100 Dataset

[4]

[Anonymous], 2020, P IEEE CVF C COMP VI, DOI DOI 10.1109/BIBM49941.2020.9313500

[5]

[Anonymous], 2020, ECCV

[6]

[Anonymous], 2019, CVPR

[7]

Berman M., 2019, ARXIV190205509

[8]

Brown TB, 2020, ADV NEUR IN, V33

[9] Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning [J].

Chen, Yinbo ;

Liu, Zhuang ;

Xu, Huijuan ;

Darrell, Trevor ;

Wang, Xiaolong .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9042-9051

[10] A Novel Hybrid Convolutional Neural Network for Accurate Organ Segmentation in 3D Head and Neck CT Images [J].

Chen, Zijie ;

Li, Cheng ;

He, Junjun ;

Ye, Jin ;

Song, Diping ;

Wang, Shanshan ;

Gu, Lixu ;

Qiao, Yu .

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 :569-578

← 1 2 3 4 5 6 →