W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection

被引:60
作者
Zhang, Yongqiang [1 ,2 ]
Bai, Yancheng [1 ,3 ]
Ding, Mingli [2 ]
Li, Yongqiang [2 ]
Ghanem, Bernard [1 ]
机构
[1] KAUST, Visual Comp Ctr, Thuwal, Saudi Arabia
[2] HIT, Sch Elect Engn & Automat, Harbin, Heilongjiang, Peoples R China
[3] Chinese Acad Sci, Inst Software, Beijing, Peoples R China
来源
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年
关键词
D O I
10.1109/CVPR.2018.00103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly-supervised object detection has attracted much attention lately, since it does not require bounding box annotations for training. Although significant progress has also been made, there is still a large gap in performance between weakly-supervised and fully-supervised object detection. Recently, some works use pseudo ground-truths which are generated by a weakly-supervised detector to train a supervised detector. Such approaches incline to find the most representative parts of objects, and only seek one ground-truth box per class even though many same-class instances exist. To overcome these issues, we propose a weakly-supervised to fully-supervised framework, where a weakly-supervised detector is implemented using multiple instance learning. Then, we propose a pseudo ground-truth excavation (PGE) algorithm to find the pseudo ground-truth of each instance in the image. Moreover, the pseudo ground-truth adaptation (PGA) algorithm is designed to further refine the pseudo ground-truths from PGE. Finally, we use these pseudo ground-truths to train a fully-supervised detector. Extensive experiments on the challenging PASCAL VOC 2007 and 2012 benchmarks strongly demonstrate the effectiveness of our framework. We obtain 52.4% and 47.8% mAP on VOC2007 and VOC2012 respectively, a significant improvement over previous state-of-the-art methods.
引用
收藏
页码:928 / 936
页数:9
相关论文
共 36 条
[1]  
[Anonymous], 2016, CVPR, DOI DOI 10.1109/CVPR.2016.382
[2]   Weakly Supervised Deep Detection Networks [J].
Bilen, Hakan ;
Vedaldi, Andrea .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2846-2854
[3]  
Bilen H, 2015, PROC CVPR IEEE, P1081, DOI 10.1109/CVPR.2015.7298711
[4]   Webly Supervised Learning of Convolutional Networks [J].
Chen, Xinlei ;
Gupta, Abhinav .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1431-1439
[5]   Multi-fold MIL Training for Weakly Supervised Object Localization [J].
Cinbis, Ramazan Gokberk ;
Verbeek, Jakob ;
Schmid, Cordelia .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2409-2416
[6]  
CINBIS RG, 2017, TPAMI, V39, P189, DOI DOI 10.1109/TPAMI.2016.2535231
[7]  
Dai J., 2016, ADV NEURAL INFORM PR, V29, P379, DOI [DOI 10.1016/J.JPOWSOUR.2007.02.075, DOI 10.48550/ARXIV.1605.06409, DOI 10.1109/CVPR.2017.690]
[8]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9]   Weakly Supervised Localization and Learning with Generic Knowledge [J].
Deselaers, Thomas ;
Alexe, Bogdan ;
Ferrari, Vittorio .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2012, 100 (03) :275-293
[10]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338