Semantic Prompt for Few-Shot Image Recognition

被引:34
作者
Chen, Wentao [1 ,2 ]
Si, Chenyang [3 ]
Zhang, Zhang [2 ,4 ]
Wang, Liang [2 ,4 ]
Wang, Zilei [1 ]
Tan, Tieniu [1 ,2 ,4 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] CASIA, NLPR, Ctr Res Intelligent Percept & Comp, Hangzhou, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
[4] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
DATA-EFFICIENT;
D O I
10.1109/CVPR52729.2023.02258
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Few-shot learning is a challenging problem since only a few examples are provided to recognize a new class. Several recent studies exploit additional semantic information, e.g. text embeddings of class names, to address the issue of rare samples through combining semantic prototypes with visual prototypes. However, these methods still suffer from the spurious visual features learned from the rare support samples, resulting in limited benefits. In this paper, we propose a novel Semantic Prompt (SP) approach for few-shot learning. Instead of the naive exploitation of semantic information for remedying classifiers, we explore leveraging semantic information as prompts to tune the visual feature extraction network adaptively. Specifically, we design two complementary mechanisms to insert semantic prompts into the feature extractor: one is to enable the interaction between semantic prompts and patch embeddings along the spatial dimension via self-attention, another is to supplement visual features with the transformed semantic prompts along the channel dimension. By combining these two mechanisms, the feature extractor presents a better ability to attend to the class-specific features and obtains more generalized image representations with merely a few support samples. Through extensive experiments on four datasets, the proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
引用
收藏
页码:23581 / 23591
页数:11
相关论文
共 59 条
[11]   Randaugment: Practical automated data augmentation with a reduced search space [J].
Cubuk, Ekin D. ;
Zoph, Barret ;
Shlens, Jonathon ;
Le, Quoc, V .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :3008-3017
[12]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[13]  
diaeresis>rek Afra Feyza Akyu<spacing, 2022, INT C LEARN REPR, V1
[14]  
Dong Bowen, 2022, ARXIV220307057
[15]  
Dosovitskiy A., 2020, ICLR 2021
[16]  
Finn C, 2017, PR MACH LEARN RES, V70
[17]   Recent Advances in Zero-Shot Recognition Toward data-efficient understanding of visual content [J].
Fu, Yanwei ;
Xiang, Tao ;
Jiang, Yu-Gang ;
Xue, Xiangyang ;
Sigal, Leonid ;
Gong, Shaogang .
IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) :112-125
[18]   Boosting Few-Shot Visual Learning with Self-Supervision [J].
Gidaris, Spyros ;
Bursuc, Andrei ;
Komodakis, Nikos ;
Perez, Patrick ;
Cord, Matthieu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8058-8067
[19]   Momentum Contrast for Unsupervised Visual Representation Learning [J].
He, Kaiming ;
Fan, Haoqi ;
Wu, Yuxin ;
Xie, Saining ;
Girshick, Ross .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9726-9735
[20]   Local descriptor-based multi-prototype network for few-shot Learning [J].
Huang, Hongwei ;
Wu, Zhangkai ;
Li, Wenbin ;
Huo, Jing ;
Gao, Yang .
PATTERN RECOGNITION, 2021, 116