Extreme R-CNN: Few-Shot Object Detection via Sample Synthesis and Knowledge Distillation

被引:1
作者
Zhang, Shenyong [1 ,2 ]
Wang, Wenmin [1 ]
Wang, Zhibing [1 ]
Li, Honglei [1 ,3 ]
Li, Ruochen [1 ,4 ]
Zhang, Shixiong [1 ]
机构
[1] Macau Univ Sci & Technol, Sch Comp Sci & Engn, Macau 999078, Peoples R China
[2] Beijing Inst Technol, Sch Comp Technol, Zhuhai 519088, Peoples R China
[3] Chongqing Polytech Univ Elect Technol, Artificial Intelligence & Big Data Coll, Chongqing 401331, Peoples R China
[4] Guangdong BOHUA UHD Video Innovat Ctr Co Ltd, Shenzhen 518172, Peoples R China
关键词
two-stage fine-tuning approach; few-shot object detection; synthesizing sample; knowledge distillation; triplet loss;
D O I
10.3390/s24237833
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Traditional object detectors require extensive instance-level annotations for training. Conversely, few-shot object detectors, which are generally fine-tuned using limited data from unknown classes, tend to show biases toward base categories and are susceptible to variations within these unknown samples. To mitigate these challenges, we introduce a Two-Stage Fine-Tuning Approach (TFA) named Extreme R-CNN, designed to operate effectively with extremely limited original samples through the integration of sample synthesis and knowledge distillation. Our approach involves synthesizing new training examples via instance clipping and employing various data-augmentation techniques. We enhance the Faster R-CNN architecture by decoupling the regression and classification components of the Region of Interest (RoI), allowing synthetic samples to train the classification head independently of the object-localization process. Comprehensive evaluations on the Microsoft COCO and PASCAL VOC datasets demonstrate significant improvements over baseline methods. Specifically, on the PASCAL VOC dataset, the average precision for novel categories is enhanced by up to 15 percent, while on the more complex Microsoft COCO benchmark it is enhanced by up to 6.1 percent. Remarkably, in the 1-shot scenario, the AP50 of our model exceeds that of the baseline model in the 10-shot setting within the PASCAL VOC dataset, confirming the efficacy of our proposed method.
引用
收藏
页数:14
相关论文
共 47 条
[1]  
Ba LJ, 2014, ADV NEUR IN, V27
[2]  
Bucilua Cristian, 2006, P 12 ACM SIGKDD INT, P535
[3]   Adaptive Image Transformer for One-Shot Object Detection [J].
Chen, Ding-Jie ;
Hsieh, He-Yen ;
Liu, Tyng-Luh .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12242-12251
[4]  
Chen H, 2018, AAAI CONF ARTIF INTE, P2836
[5]  
Chen T, 2020, PR MACH LEARN RES, V119
[6]   Dual-Awareness Attention for Few-Shot Object Detection [J].
Chen, Tung-, I ;
Liu, Yueh-Cheng ;
Su, Hung-Ting ;
Chang, Yu-Cheng ;
Lin, Yu-Hsiang ;
Yeh, Jia-Fong ;
Chen, Wen-Chin ;
Hsu, Winston H. .
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :291-301
[7]  
Doersch C., 2020, Adv. Neural Inf. Process. Syst., V33, P21981
[8]   The PASCAL Visual Object Classes Challenge: A Retrospective [J].
Everingham, Mark ;
Eslami, S. M. Ali ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136
[9]   Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector [J].
Fan, Qi ;
Zhuo, Wei ;
Tang, Chi-Keung ;
Tai, Yu-Wing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4012-4021
[10]  
Fujita K, 2018, 2018 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), P275