KDNet: Leveraging Vision-Language Knowledge Distillation for Few-Shot Object Detection

被引:0
|
作者
Ma, Mengyuan [1 ]
Qian, Lin [1 ]
Yin, Hujun [1 ]
机构
[1] Univ Manchester, Manchester M13 9PL, Lancs, England
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II | 2024年 / 15017卷
关键词
Object detection; Few-shot learning; Vision-language model; Knowledge distillation;
D O I
10.1007/978-3-031-72335-3_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Few-shot object detection (FSOD) aims to detect new categories given only few instances for training. Recently emerged vision-language models (VLMs) have shown great performances in zero-shot and open-vocabulary object detection due to their strong ability to align object-level embedding with textual embedding of categories. However, few existing models distill VLMs' object-level knowledge in FSOD, which can help FSOD to learn novel semantic concepts to gain further improvement. Inspired by the recent knowledge distillation approaches with VLMs, we propose an end-to-end few-shot object detector with knowledge distillation from pre-trained VLMs, termed KDNet. A knowledge distillation branch is introduced alongside the object detector to distill knowledge from VLMs' visual encoder to the object detector. Also, we propose a pre-training mechanism with large-scale dataset to inject more semantic concepts to the detector to improve the performance on small datasets. The KDNet achieved state-of-the-art performance on both PASCAL VOC and MS COCO benchmarks over most of the shot settings and evaluation metrics.
引用
收藏
页码:153 / 167
页数:15
相关论文
共 50 条
  • [1] Few-Shot Adaptation of Medical Vision-Language Models
    Shakeri, Fereshteh
    Huang, Yunshi
    Silva-Rodriguez, Julio
    Bahig, Houda
    Tang, An
    Dolz, Jose
    Ben Ayed, Ismail
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 553 - 563
  • [2] Black Box Few-Shot Adaptation for Vision-Language models
    Ouali, Yassine
    Bulat, Adrian
    Matinez, Brais
    Tzimiropoulos, Georgios
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15488 - 15500
  • [3] Supporting vision-language model few-shot inference with confounder-pruned knowledge prompt
    Li, Jiangmeng
    Mo, Wenyi
    Song, Fei
    Sun, Chuxiong
    Qiang, Wenwen
    Su, Bing
    Zheng, Changwen
    NEURAL NETWORKS, 2025, 185
  • [4] A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
    Iguez, Julio Silva-Rodr
    Hajimiri, Sina
    Ben Ayed, Ismail
    Dolz, Jose
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23681 - 23690
  • [5] TriMPL: Masked Multi-Prompt Learning with Knowledge Mixing for Vision-Language Few-shot Learning
    Liu, Xiangyu
    Shang, Yanlei
    Chen, Yong
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 552 - 560
  • [6] Multiple knowledge embedding for few-shot object detection
    Gong, Xiaolin
    Cai, Youpeng
    Wang, Jian
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2231 - 2240
  • [7] Few-Shot Object Detection via Knowledge Transfer
    Kim, Geonuk
    Jung, Hong-Gyu
    Lee, Seong-Whan
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 3564 - 3569
  • [8] Multiple knowledge embedding for few-shot object detection
    Xiaolin Gong
    Youpeng Cai
    Jian Wang
    Signal, Image and Video Processing, 2023, 17 : 2231 - 2240
  • [9] Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
    Cheng, Cheng
    Song, Lin
    Xue, Ruoyi
    Wang, Hang
    Sun, Hongbin
    Ge, Yixiao
    Shan, Ying
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Few-Shot Image Classification of Crop Diseases Based on Vision-Language Models
    Zhou, Yueyue
    Yan, Hongping
    Ding, Kun
    Cai, Tingting
    Zhang, Yan
    SENSORS, 2024, 24 (18)