Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

被引：46

作者：

Gao, Mingfei ^{[1
]}

Xing, Chen ^{[1
]}

Niebles, Juan Carlos ^{[1
]}

Li, Junnan ^{[1
]}

Xu, Ran ^{[1
]}

Liu, Wenhao ^{[1
]}

Xiong, Caiming ^{[1
]}

机构：

[1] Salesforce Res, Palo Alto, CA 94301 USA

来源：

COMPUTER VISION, ECCV 2022, PT X | 2022年 / 13670卷

关键词：

Open vocabulary detection; Pseudo bounding-box labels;

D O I：

10.1007/978-3-031-20080-9_16

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite great progress in object detection, most existing methods work only on a limited set of object categories, due to the tremendous human effort needed for bounding-box annotations of training data. To alleviate the problem, recent open vocabulary and zero-shot detection methods attempt to detect novel object categories beyond those seen during training. They achieve this goal by training on a predefined base categories to induce generalization to novel objects. However, their potential is still constrained by the small set of base categories available for training. To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs. Our method leverages the localization ability of pre-trained vision-language models to generate pseudo bounding-box labels and then directly uses them for training object detectors. Experimental results show that our method outperforms the state-of-the-art open vocabulary detector by 8% AP on COCO novel categories, by 6.3% AP on PASCAL VOC, by 2.3% AP on Objects365 and by 2.8% AP on LVIS.

引用

页码：266 / 282

页数：17

共 36 条

[1]

[Anonymous], 2010, International journal of computer vision, DOI DOI 10.1007/s11263-009-0275-4

[2] Zero-Shot Object Detection [J].

Bansal, Ankan ;

Sikka, Karan ;

Sharma, Gaurav ;

Chellappa, Rama ;

Divakaran, Ajay .

COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 :397-414

[3] Weakly Supervised Deep Detection Networks [J].

Bilen, Hakan ;

Vedaldi, Andrea .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2846-2854

[4]

Chen XL, 2015, Arxiv, DOI arXiv:1504.00325

[5]

Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arXiv.1810.04805]

[6] The PASCAL Visual Object Classes Challenge: A Retrospective [J].

Everingham, Mark ;

Eslami, S. M. Ali ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136

[7] C-WSL: Count-Guided Weakly Supervised Localization [J].

Gao, Mingfei ;

Li, Ang ;

Yu, Ruichi ;

Morariu, Vlad, I ;

Davis, Larry S. .

COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 :155-171

[8] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[9]

Gu X., 2021, arXiv

[10] LVIS: A Dataset for Large Vocabulary Instance Segmentation [J].

Gupta, Agrim ;

Dollar, Piotr ;

Girshick, Ross .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5351-5359

← 1 2 3 4 →