Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training

被引:410
作者
Zhang, Hongkai [1 ,2 ]
Chang, Hong [1 ,2 ]
Ma, Bingpeng [2 ]
Wang, Naiyan [3 ]
Chen, Xilin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc Chinese Acad Sc, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] TuSimple, San Diego, CA USA
来源
COMPUTER VISION - ECCV 2020, PT XV | 2020年 / 12360卷
基金
北京市自然科学基金;
关键词
Dynamic training; High quality object detection;
D O I
10.1007/978-3-030-58555-6_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although two-stage object detectors have continuously advanced the state-of-the-art performance in recent years, the training process itself is far from crystal. In this work, we first point out the inconsistency problem between the fixed network settings and the dynamic training procedure, which greatly affects the performance. For example, the fixed label assignment strategy and regression loss function cannot fit the distribution change of proposals and thus are harmful to training high quality detectors. Consequently, we propose Dynamic R-CNN to adjust the label assignment criteria (IoU threshold) and the shape of regression loss function (parameters of SmoothL1 Loss) automatically based on the statistics of proposals during training. This dynamic design makes better use of the training samples and pushes the detector to fit more high quality samples. Specifically, our method improves upon ResNet-50-FPN baseline with 1.9% AP and 5.5% AP90 on the MS COCO dataset with no extra overhead. Codes and models are available at https://github.com/hkzhang95/DynamicRCNN.
引用
收藏
页码:260 / 275
页数:16
相关论文
共 51 条
[11]  
Girshick R., 2018, Detectron
[12]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[13]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587
[14]   Appearance-Preserving 3D Convolution for Video-Based Person Re-identification [J].
Gu, Xinqian ;
Chang, Hong ;
Ma, Bingpeng ;
Zhang, Hongkai ;
Chen, Xilin .
COMPUTER VISION - ECCV 2020, PT II, 2020, 12347 :228-243
[15]  
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[16]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[17]   Bounding Box Regression with Uncertainty for Accurate Object Detection [J].
He, Yihui ;
Zhu, Chenchen ;
Wang, Jianren ;
Savvides, Marios ;
Zhang, Xiangyu .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2883-2892
[18]   Speed/accuracy trade-offs for modern convolutional object detectors [J].
Huang, Jonathan ;
Rathod, Vivek ;
Sun, Chen ;
Zhu, Menglong ;
Korattikara, Anoop ;
Fathi, Alireza ;
Fischer, Ian ;
Wojna, Zbigniew ;
Song, Yang ;
Guadarrama, Sergio ;
Murphy, Kevin .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3296-+
[19]   Acquisition of Localization Confidence for Accurate Object Detection [J].
Jiang, Borui ;
Luo, Ruixuan ;
Mao, Jiayuan ;
Xiao, Tete ;
Jiang, Yuning .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :816-832
[20]   Learning Where to Focus for Efficient Video Object Detection [J].
Jiang, Zhengkai ;
Liu, Yu ;
Yang, Ceyuan ;
Liu, Jihao ;
Gao, Peng ;
Zhang, Qian ;
Xiang, Shiming ;
Pan, Chunhong .
COMPUTER VISION - ECCV 2020, PT XVI, 2020, 12361 :18-34