KDNet: Leveraging Vision-Language Knowledge Distillation for Few-Shot Object Detection

被引：0

作者：

Ma, Mengyuan ^{[1
]}

Qian, Lin ^{[1
]}

Yin, Hujun ^{[1
]}

机构：

[1] Univ Manchester, Manchester M13 9PL, Lancs, England

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II | 2024年 / 15017卷

关键词：

Object detection; Few-shot learning; Vision-language model; Knowledge distillation;

D O I：

10.1007/978-3-031-72335-3_11

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Few-shot object detection (FSOD) aims to detect new categories given only few instances for training. Recently emerged vision-language models (VLMs) have shown great performances in zero-shot and open-vocabulary object detection due to their strong ability to align object-level embedding with textual embedding of categories. However, few existing models distill VLMs' object-level knowledge in FSOD, which can help FSOD to learn novel semantic concepts to gain further improvement. Inspired by the recent knowledge distillation approaches with VLMs, we propose an end-to-end few-shot object detector with knowledge distillation from pre-trained VLMs, termed KDNet. A knowledge distillation branch is introduced alongside the object detector to distill knowledge from VLMs' visual encoder to the object detector. Also, we propose a pre-training mechanism with large-scale dataset to inject more semantic concepts to the detector to improve the performance on small datasets. The KDNet achieved state-of-the-art performance on both PASCAL VOC and MS COCO benchmarks over most of the shot settings and evaluation metrics.

引用

页码：153 / 167

页数：15

共 50 条

[1] Few-Shot Adaptation of Medical Vision-Language Models
Shakeri, Fereshteh
Huang, Yunshi
Silva-Rodriguez, Julio
Bahig, Houda
Tang, An
Dolz, Jose
Ben Ayed, Ismail
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 553 - 563
[2] Black Box Few-Shot Adaptation for Vision-Language models
Ouali, Yassine
Bulat, Adrian
Matinez, Brais
Tzimiropoulos, Georgios
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15488 - 15500
[3] Supporting vision-language model few-shot inference with confounder-pruned knowledge prompt
Li, Jiangmeng
Mo, Wenyi
Song, Fei
Sun, Chuxiong
Qiang, Wenwen
Su, Bing
Zheng, Changwen
NEURAL NETWORKS, 2025, 185
[4] A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
Iguez, Julio Silva-Rodr
Hajimiri, Sina
Ben Ayed, Ismail
Dolz, Jose
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23681 - 23690
[5] TriMPL: Masked Multi-Prompt Learning with Knowledge Mixing for Vision-Language Few-shot Learning
Liu, Xiangyu
Shang, Yanlei
Chen, Yong
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 552 - 560
[6] Multiple knowledge embedding for few-shot object detection
Gong, Xiaolin
Cai, Youpeng
Wang, Jian
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2231 - 2240
[7] Few-Shot Object Detection via Knowledge Transfer
Kim, Geonuk
Jung, Hong-Gyu
Lee, Seong-Whan
2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 3564 - 3569
[8] Multiple knowledge embedding for few-shot object detection
Xiaolin Gong
Youpeng Cai
Jian Wang
Signal, Image and Video Processing, 2023, 17 : 2231 - 2240
[9] Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
Cheng, Cheng
Song, Lin
Xue, Ruoyi
Wang, Hang
Sun, Hongbin
Ge, Yixiao
Shan, Ying
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] Few-Shot Image Classification of Crop Diseases Based on Vision-Language Models
Zhou, Yueyue
Yan, Hongping
Ding, Kun
Cai, Tingting
Zhang, Yan
SENSORS, 2024, 24 (18)

← 1 2 3 4 5 →