Focal Loss for Dense Object Detection

被引：6539

作者：

Lin, Tsung-Yi ^{[1
]}

Goyal, Priya ^{[1
]}

Girshick, Ross ^{[1
]}

He, Kaiming ^{[1
]}

Dollar, Piotr ^{[1
]}

机构：

[1] Facebook AI Res, Menlo Pk, CA 94025 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2020年 / 42卷 / 02期

关键词：

Detectors; Training; Object detection; Entropy; Proposals; Convolutional neural networks; Feature extraction; Computer vision; object detection; machine learning; convolutional neural networks;

D O I：

10.1109/TPAMI.2018.2858826

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: https://github.com/facebookresearch/Detectron.

引用

页码：318 / 327

页数：10

共 40 条

[11]

Dai J, 2016, PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), P1796, DOI 10.1109/ICIT.2016.7475036

[12] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[13]

Dollar P., 2009, Bmvc, P5

[14] Scalable Object Detection using Deep Neural Networks [J].

Erhan, Dumitru ;

Szegedy, Christian ;

Toshev, Alexander ;

Anguelov, Dragomir .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2155-2162

[15]

Everingham M., 2007, International journal of computer vision, DOI DOI 10.1007/s11263-009-0275-4

[16] Cascade Object Detection with Deformable Part Models [J].

Felzenszwalb, Pedro F. ;

Girshick, Ross B. ;

McAllester, David .

2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :2241-2248

[17] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[18] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

[19]

He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]

[20] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

← 1 2 3 4 →