Focus Your Attention when Few-Shot Classification

被引：0

作者：

Wang, Haoqing ^{[1
]}

Jie, Shibo ^{[1
]}

Deng, Zhi-Hong ^{[1
]}

机构：

[1] Peking Univ, Sch Intelligence Sci & Technol, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since many pre-trained vision transformers emerge and provide strong representation for various downstream tasks, we aim to adapt them to few-shot image classification tasks in this work. The input images typically contain multiple entities. The model may not focus on the class-related entities for the current few-shot task, even with fine-tuning on support samples, and the noise information from the class-independent entities harms performance. To this end, we first propose a method that uses the attention and gradient information to automatically locate the positions of key entities in the support images, denoted as position prompts. Then we employ the cross-entropy loss between their many-hot presentation and the attention logits to optimize the model to focus its attention on the key entities during fine-tuning. This ability then can generalize to the query samples. Our method is applicable to different vision transformers (e.g., columnar or pyramidal ones), and also to different pre-training ways (e.g., single-modal or vision-language pre-training). Extensive experiments show that our method can improve the performance of full or parameter-efficient fine-tuning methods on few-shot tasks. Code is available at https://github.com/Haoqing-Wang/FORT.

引用

页数：19

共 79 条

[1] Abnar S, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4190
[2] Andrychowicz M, 2016, ADV NEUR IN, V29
[3] On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation
Bach, Sebastian
Binder, Alexander
Montavon, Gregoire
Klauschen, Frederick
Mueller, Klaus-Robert
Samek, Wojciech
[J]. PLOS ONE, 2015, 10 (07):
[4] Bertinetto L, 2019, Arxiv, DOI arXiv:1805.08136
[5] Brown TB, 2020, ADV NEUR IN, V33
[6] Emerging Properties in Self-Supervised Vision Transformers
Caron, Mathilde
Touvron, Hugo
Misra, Ishan
Jegou, Herve
Mairal, Julien
Bojanowski, Piotr
Joulin, Armand
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
[7] Chen G., 2023, 11 INT C LEARN REPR
[8] Chen T, 2020, PR MACH LEARN RES, V119
[9] Few-shot Image Classification: Just Use a Library of Pre-trained Feature Extractors and a Simple Classifier
Chowdhury, Arkabandhu
Jiang, Mingchao
Chaudhuri, Swarat
Jermaine, Chris
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9425 - 9434
[10] Ding K, 2024, Arxiv, DOI [arXiv:2208.13474, DOI 10.48550/ARXIV.2208.13474, 10.48550/arXiv.2208.13474]

← 1 2 3 4 5 6 7 8 →