High-Quality R-CNN Object Detection Using Multi-Path Detection Calibration Network

被引：53

作者：

Chen, Xiaoyu ^{[1
]}

Li, Hongliang ^{[1
]}

Wu, Qingbo ^{[1
]}

Ngan, King Ngi ^{[1
]}

Xu, Linfeng ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2021年 / 31卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Convolutional neural networks (CNNs); deep learning; object detection; object recognition;

D O I：

10.1109/TCSVT.2020.2987465

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Object proposals are used in two-stage detectors, such as R-CNN, to generate detection results, including category predictions and refined bounding-boxes. As a result, classification scores are assigned to refined bounding-boxes rather than object proposals. However, this procedure ignores the discrepancy of data distribution between object proposals and refined bounding-boxes. We consider this discrepancy could limit the detection accuracy. Specifically, the foreground/background imbalance on object proposals and inaccurate information from low-IoU proposals could hinder the category prediction. In this paper, we propose a detector called the Multi-Path Detection Calibration Network (PDC-Net) to address this problem. The key idea behind PDC-Net is calibrating detection results from R-CNN by considering the statistical discrepancy between object proposals and refined bounding-boxes. PDC-Net is built on Faster R-CNN. The core component in PDC-Net is the multi-path detection head, in which the base detector (from Faster R-CNN) generates detection results from object proposals and multiple calibration detectors fix incorrect outputs from the base detector using refined bounding-boxes. Experiments reveal that PDC-Net can boost detection results. Our method could reach 83.1% and 43.3% mAP respectively on PASCAL VOC and MSCOCO benchmarks, which is comparable to several state-of-the-art methods.

引用

页码：715 / 727

页数：13

共 46 条

[1]

[Anonymous], 2015, PROC ADVNEURAL INF P

[2] Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [J].

Bell, Sean ;

Zitnick, C. Lawrence ;

Bala, Kavita ;

Girshick, Ross .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2874-2883

[3]

Berg A, 2017, ARXIV PREPRINT ARXIV

[4] Revisiting RCNN: On Awakening the Classification Power of Faster RCNN [J].

Cheng, Bowen ;

Wei, Yunchao ;

Shi, Honghui ;

Feris, Rogerio ;

Xiong, Jinjun ;

Huang, Thomas .

COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 :473-490

[5] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[6]

Dai J, 2016, PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), P1796, DOI 10.1109/ICIT.2016.7475036

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8] Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering [J].

Dong, Xuanyi ;

Zhu, Linchao ;

Zhang, De ;

Yang, Yi ;

Wu, Fei .

PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :54-62

[9] The Pascal Visual Object Classes (VOC) Challenge [J].

Everingham, Mark ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338

[10] OSMO: Online Specific Models for Occlusion in Multiple Object Tracking under Surveillance Scene [J].

Gao, Xu ;

Jiang, Tingting .

PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :201-210

← 1 2 3 4 5 →