Few-Shot Object Detection by Knowledge Distillation Using Bag-of-Visual-Words Representations

被引：10

作者：

Pei, Wenjie ^{[2
]}

Wu, Shuang ^{[2
]}

Mei, Dianwen ^{[2
]}

Chen, Fanglin ^{[2
]}

Tian, Jiandong ^{[3
]}

Lu, Guangming ^{[1
,2
]}

机构：

[1] Guangdong Prov Key Lab Novel Secur Intelligence T, Shenzhen, Peoples R China

[2] Harbin Inst Technol, Shenzhen, Peoples R China

[3] Chinese Acad Sci, Shenyang Inst Automat, Shenyang, Peoples R China

来源：

COMPUTER VISION, ECCV 2022, PT X | 2022年 / 13670卷

关键词：

Few-shot object detection; Bag of visual words; Knowledge distillation;

D O I：

10.1007/978-3-031-20080-9_17

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While fine-tuning based methods for few-shot object detection have achieved remarkable progress, a crucial challenge that has not been addressed well is the potential class-specific overfitting on base classes and sample-specific overfitting on novel classes. In this work we design a novel knowledge distillation framework to guide the learning of the object detector and thereby restrain the overfitting in both the pre-training stage on base classes and fine-tuning stage on novel classes. To be specific, we first present a novel Position-Aware Bag-of-Visual-Words model for learning a representative bag of visual words (BoVW) from a limited size of image set, which is used to encode general images based on the similarities between the learned visual words and an image. Then we perform knowledge distillation based on the fact that an image should have consistent BoVW representations in two different feature spaces. To this end, we pre-learn a feature space independently from the object detection, and encode images using BoVW in this space. The obtained BoVW representation for an image can be considered as distilled knowledge to guide the learning of object detector: the extracted features by the object detector for the same image are expected to derive the consistent BoVW representations with the distilled knowledge. Extensive experiments validate the effectiveness of our method and demonstrate the superiority over other state-of-the-art methods.

引用

页码：283 / 299

页数：17

共 52 条

[1]

[Anonymous], 2010, International journal of computer vision, DOI DOI 10.1007/s11263-009-0275-4

[2]

Bowen C., 2021, NeurIPS

[3]

Chen HC, 2018, AAAI CONF ARTIF INTE, P2127

[4]

Cogswell M, 2016, Arxiv, DOI arXiv:1511.06068

[5]

Csurka G., 2004, WORKSH STAT LEARN CO

[6] Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector [J].

Fan, Qi ;

Zhuo, Wei ;

Tang, Chi-Keung ;

Tai, Yu-Wing .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4012-4021

[7] Generalized Few-Shot Object Detection without Forgetting [J].

Fan, Zhibo ;

Ma, Yuchen ;

Li, Zeming ;

Sun, Jian .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4525-4534

[8]

Finn C, 2017, PR MACH LEARN RES, V70

[9] Recent Advances in Zero-Shot Recognition Toward data-efficient understanding of visual content [J].

Fu, Yanwei ;

Xiang, Tao ;

Jiang, Yu-Gang ;

Xue, Xiangyang ;

Sigal, Leonid ;

Gong, Shaogang .

IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) :112-125

[10] OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning [J].

Gidaris, Spyros ;

Bursuc, Andrei ;

Puy, Gilles ;

Komodakis, Nikos ;

Cord, Matthieu ;

Perez, Patrick .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6826-6836

← 1 2 3 4 5 6 →