Focus Your Attention when Few-Shot Classification

被引:0
作者
Wang, Haoqing [1 ]
Jie, Shibo [1 ]
Deng, Zhi-Hong [1 ]
机构
[1] Peking Univ, Sch Intelligence Sci & Technol, Beijing, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since many pre-trained vision transformers emerge and provide strong representation for various downstream tasks, we aim to adapt them to few-shot image classification tasks in this work. The input images typically contain multiple entities. The model may not focus on the class-related entities for the current few-shot task, even with fine-tuning on support samples, and the noise information from the class-independent entities harms performance. To this end, we first propose a method that uses the attention and gradient information to automatically locate the positions of key entities in the support images, denoted as position prompts. Then we employ the cross-entropy loss between their many-hot presentation and the attention logits to optimize the model to focus its attention on the key entities during fine-tuning. This ability then can generalize to the query samples. Our method is applicable to different vision transformers (e.g., columnar or pyramidal ones), and also to different pre-training ways (e.g., single-modal or vision-language pre-training). Extensive experiments show that our method can improve the performance of full or parameter-efficient fine-tuning methods on few-shot tasks. Code is available at https://github.com/Haoqing-Wang/FORT.
引用
收藏
页数:19
相关论文
共 79 条
  • [1] Abnar S, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4190
  • [2] Andrychowicz M, 2016, ADV NEUR IN, V29
  • [3] On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation
    Bach, Sebastian
    Binder, Alexander
    Montavon, Gregoire
    Klauschen, Frederick
    Mueller, Klaus-Robert
    Samek, Wojciech
    [J]. PLOS ONE, 2015, 10 (07):
  • [4] Bertinetto L, 2019, Arxiv, DOI arXiv:1805.08136
  • [5] Brown TB, 2020, ADV NEUR IN, V33
  • [6] Emerging Properties in Self-Supervised Vision Transformers
    Caron, Mathilde
    Touvron, Hugo
    Misra, Ishan
    Jegou, Herve
    Mairal, Julien
    Bojanowski, Piotr
    Joulin, Armand
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
  • [7] Chen G., 2023, 11 INT C LEARN REPR
  • [8] Chen T, 2020, PR MACH LEARN RES, V119
  • [9] Few-shot Image Classification: Just Use a Library of Pre-trained Feature Extractors and a Simple Classifier
    Chowdhury, Arkabandhu
    Jiang, Mingchao
    Chaudhuri, Swarat
    Jermaine, Chris
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9425 - 9434
  • [10] Ding K, 2024, Arxiv, DOI [arXiv:2208.13474, DOI 10.48550/ARXIV.2208.13474, 10.48550/arXiv.2208.13474]