Bounding Box Regression with Uncertainty for Accurate Object Detection

被引:422
作者
He, Yihui [1 ]
Zhu, Chenchen [1 ]
Wang, Jianren [1 ]
Savvides, Marios [1 ]
Zhang, Xiangyu [2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Megvii Inc Face, Beijing, Peoples R China
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR.2019.00300
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale object detection datasets (e.g., MS-COCO) try to define the ground truth bounding boxes as clear as possible. However, we observe that ambiguities are still introduced when labeling the bounding boxes. In this paper, we propose a novel bounding box regression loss for learning bounding box transformation and localization variance together. Our loss greatly improves the localization accuracies of various architectures with nearly no additional computation. The learned localization variance allows us to merge neighboring bounding boxes during non-maximum suppression (NMS), which further improves the localization performance. On MS-COCO, we boost the Average Precision (AP) of VGG-16 Faster R-CNN from 23.6% to 29.1%. More importantly, for ResNet-50-FPN Mask R-CNN, our method improves the AP and AP(90) by 1.8% and 6.2% respectively, which significantly outperforms previous stateof-the-art bounding box refinement methods. Our code and models are available at github.com/yihui-he/KL-Loss
引用
收藏
页码:2883 / 2892
页数:10
相关论文
共 51 条
[31]  
Li Z., 2017, LIGHT HEAD R CNN DEF
[32]  
Liang Y., 2017, ARXIV170308173
[33]  
Lin Tsung-Yi, 2014, EECV, V1, P4
[34]  
LIN TY, 2017, PROC CVPR IEEE, P936, DOI DOI 10.1109/CVPR.2017.106
[35]   SSD: Single Shot MultiBox Detector [J].
Liu, Wei ;
Anguelov, Dragomir ;
Erhan, Dumitru ;
Szegedy, Christian ;
Reed, Scott ;
Fu, Cheng-Yang ;
Berg, Alexander C. .
COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :21-37
[36]   Distinctive image features from scale-invariant keypoints [J].
Lowe, DG .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 60 (02) :91-110
[37]   Scalable parallel programming with CUDA [J].
Nickolls, John ;
Buck, Ian ;
Garland, Michael ;
Skadron, Kevin .
Queue, 2008, 6 (02) :40-53
[38]   Learning Globally Optimized Object Detector via Policy Gradient [J].
Rao, Yongming ;
Lin, Dahua ;
Lu, Jiwen ;
Zhou, Jie .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6190-6198
[39]   YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video [J].
Real, Esteban ;
Shlens, Jonathon ;
Mazzocchi, Stefano ;
Pan, Xin ;
Vanhoucke, Vincent .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :7464-7473
[40]  
REDMON J, 2016, PROC CVPR IEEE, P779, DOI [DOI 10.1109/CVPR.2016.91, 10.1109/CVPR.2016.91]