Few-shot object detection via class encoding and multi-target decoding

被引:1
作者
Guo, Xueqiang [1 ]
Yang, Hanqing [1 ]
Wei, Mohan [1 ]
Ye, Xiaotong [1 ]
Zhang, Yu [1 ,2 ]
机构
[1] Zhejiang Univ, Coll Control Sci & Engn, State Key Lab Ind Control Technol, Hangzhou, Peoples R China
[2] Key Lab Collaborat Sensing & Autonomous Unmanned S, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Class Margin; Few-Shot Object Detection; Multi-Target; Transformer;
D O I
10.1049/csy2.12088
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of few-shot object detection is to classify and locate objects through a few annotated samples. Although many studies have tried to solve this problem, the results are still not satisfactory. Recent studies have found that the class margin significantly impacts the classification and representation of the targets to be detected. Most methods use the loss function to balance the class margin, but the results show that the loss-based methods only have a tiny improvement on the few-shot object detection problem. In this study, the authors propose a class encoding method based on the transformer to balance the class margin, which can make the model pay more attention to the essential information of the features, thus increasing the recognition ability of the sample. Besides, the authors propose a multi-target decoding method to aggregate RoI vectors generated from multi-target images with multiple support vectors, which can significantly improve the detection ability of the detector for multi-target images. Experiments on Pascal visual object classes (VOC) and Microsoft Common Objects in Context datasets show that our proposed Few-Shot Object Detection via Class Encoding and Multi-Target Decoding significantly improves upon baseline detectors (average accuracy improvement is up to 10.8% on VOC and 2.1% on COCO), achieving competitive performance. In general, we propose a new way to regulate the class margin between support set vectors and a way of feature aggregation for images containing multiple objects and achieve remarkable results. Our method is implemented on mmfewshot, and the code will be available later.
引用
收藏
页数:14
相关论文
共 59 条
[1]   Cascade R-CNN: Delving into High Quality Object Detection [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162
[2]  
Cao Y., 2021, Adv. Neural Inf. Process. Syst., V34
[3]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[4]   High-Quality R-CNN Object Detection Using Multi-Path Detection Calibration Network [J].
Chen, Xiaoyu ;
Li, Hongliang ;
Wu, Qingbo ;
Ngan, King Ngi ;
Xu, Linfeng .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (02) :715-727
[5]   FFTI: Image inpainting algorithm via features fusion and two-steps inpainting [J].
Chen, Yuantao ;
Xia, Runlong ;
Zou, Ke ;
Yang, Kai .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 91
[6]   MFFN: image super-resolution via multi-level features fusion network [J].
Chen, Yuantao ;
Xia, Runlong ;
Yang, Kai ;
Zou, Ke .
VISUAL COMPUTER, 2024, 40 (02) :489-504
[7]   Image super-resolution reconstruction based on feature map attention mechanism [J].
Chen, Yuantao ;
Liu, Linwu ;
Phonevilay, Volachith ;
Gu, Ke ;
Xia, Runlong ;
Xie, Jingbo ;
Zhang, Qian ;
Yang, Kai .
APPLIED INTELLIGENCE, 2021, 51 (07) :4367-4380
[8]   The face image super-resolution algorithm based on combined representation learning [J].
Chen, Yuantao ;
Phonevilay, Volachith ;
Tao, Jiajun ;
Chen, Xi ;
Xia, Runlong ;
Zhang, Qian ;
Yang, Kai ;
Xiong, Jie ;
Xie, Jingbo .
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (20) :30839-30861
[9]   Research on image inpainting algorithm of improved total variation minimization method [J].
Chen, Yuantao ;
Zhang, Haopeng ;
Liu, Linwu ;
Tao, Jiajun ;
Zhang, Qian ;
Yang, Kai ;
Xia, Runlong ;
Xie, Jingbo .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (5) :5555-5564
[10]  
Cheng M., 2021, IEEE Trans. Circ. Syst. Video Technol.