A CLIP Guided Model for Few-Shot Object Detection

被引:1
作者
Sun, Chang [1 ]
Li, Yuehua [1 ]
Xing, Yan [2 ]
Zhang, Weidong [3 ]
Ai, Yibo [3 ]
Wang, Sheng [3 ]
Li, Chao [1 ]
机构
[1] Zhejiang Lab, Hangzhou, Peoples R China
[2] Beijing Inst Control Engn, Beijing, Peoples R China
[3] Univ Sci & Technol Beijing, Natl Ctr Mat Serv Safety, Beijing, Peoples R China
来源
INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2024, PT VII | 2025年 / 15207卷
基金
中国国家自然科学基金;
关键词
Deep learning; Few-shot object detection; CLIP; Transfer learning;
D O I
10.1007/978-981-96-0780-8_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting novel categories in few-shot object detection models is a challenging task due to the limited annotations during training, which consequently restricts the expressive power of the extracted detection features. In this paper, our focus is on effectively leveraging prior information to enhance feature representation capability, leading to the proposal of a few-shot object detection method based on the Contrastive Language-Image Pre-training (CLIP) model, named Few-CLIP. We introduced an image-text association module to integrate text features from the CLIP text encoder and image features from the Few-CLIP backbone, thereby incorporating semantic information. Additionally, an Adapter was employed within the image-text association module to fine-tune features from the image encoder of CLIP for Few-CLIP, thereby introducing more generalized image information. Evaluation of Few-CLIP on the PASCAL VOC and COCO datasets conformed its effectiveness at detecting novel categories with limited annotations and achieving performance comparable to other state-of-the-art few-shot object detection methods.
引用
收藏
页码:3 / 17
页数:15
相关论文
共 34 条
[1]  
Chen XY, 2020, Arxiv, DOI arXiv:2007.12104
[2]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[3]   The PASCAL Visual Object Classes Challenge: A Retrospective [J].
Everingham, Mark ;
Eslami, S. M. Ali ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136
[4]   Generalized Few-Shot Object Detection without Forgetting [J].
Fan, Zhibo ;
Ma, Yuchen ;
Li, Zeming ;
Sun, Jian .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4525-4534
[5]  
Gao Peng, 2021, Clip-adapter: Better vision-language models with feature adapters
[6]   Few-Shot Object Detection with Fully Cross-Transformer [J].
Han, Guangxing ;
Ma, Jiawei ;
Huang, Shiyuan ;
Chen, Long ;
Chang, Shih-Fu .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5311-5320
[7]  
Han GX, 2022, AAAI CONF ARTIF INTE, P780
[8]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[9]   Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection [J].
Hu, Hanzhe ;
Bai, Shuai ;
Li, Aoxue ;
Cui, Jinshi ;
Wang, Liwei .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :10180-10189
[10]   Label, Verify, Correct: A Simple Few Shot Object Detection Method [J].
Kaul, Prannay ;
Xie, Weidi ;
Zisserman, Andrew .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :14217-14227