Revisiting the Sibling Head in Object Detector

被引:331
作者
Song, Guanglu [1 ]
Liu, Yu [2 ]
Wang, Xiaogang [2 ]
机构
[1] SenseTime X Lab, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong, Hong Kong, Peoples R China
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.01158
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The "shared head for classification and localization" (sibling head), firstly denominated in Fast RCNN [9], has been leading the fashion of the object detection community in the past five years. This paper provides the observation that the spatial misalignment between the two object functions in the sibling head can considerably hurt the training process, but this misalignment can be resolved by a very simple operator called task-aware spatial disentanglement (TSD). Considering the classification and regression, TSD decouples them from the spatial dimension by generating two disentangled proposals for them, which are estimated by the shared proposal. This is inspired by the natural insight that for one instance, the features in some salient area may have rich information for classification while these around the boundary may be good at bounding box regression. Surprisingly, this simple design can boost all backbones and models on both MS COCO and Google OpenImage consistently by similar to 3% mAP. Further, we propose a progressive constraint to enlarge the performance margin between the disentangled and the shared proposals, and gain similar to 1% more mAP. We show the TSD breaks through the upper bound of nowadays single-model detector by a large margin (mAP 49.4 with ResNet-101, 51.2 with SENet154), and is the core model of our 1st place solution on the Google OpenImage Challenge 2019.
引用
收藏
页码:11560 / 11569
页数:10
相关论文
共 42 条
  • [1] Soft-NMS - Improving Object Detection With One Line of Code
    Bodla, Navaneeth
    Singh, Bharat
    Chellappa, Rama
    Davis, Larry S.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5562 - 5570
  • [2] Cao Y., 2019, ARXIV
  • [3] MegDet: A Large Mini-Batch Object Detector
    Peng, Chao
    Xiao, Tete
    Li, Zeming
    Jiang, Yuning
    Zhang, Xiangyu
    Jia, Kai
    Yu, Gang
    Sun, Jian
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6181 - 6189
  • [4] Cheng B., 2018, EUR C COMP VIS ECCV
  • [5] Deformable Convolutional Networks
    Dai, Jifeng
    Qi, Haozhi
    Xiong, Yuwen
    Li, Yi
    Zhang, Guodong
    Hu, Han
    Wei, Yichen
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 764 - 773
  • [6] CenterNet: Keypoint Triplets for Object Detection
    Duan, Kaiwen
    Bai, Song
    Xie, Lingxi
    Qi, Honggang
    Huang, Qingming
    Tian, Qi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6568 - 6577
  • [7] NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
    Ghiasi, Golnaz
    Lin, Tsung-Yi
    Le, Quoc V.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7029 - 7038
  • [8] Ghiasi G, 2018, ADV NEUR IN, V31
  • [9] Region-Based Convolutional Networks for Accurate Object Detection and Segmentation
    Girshick, Ross
    Donahue, Jeff
    Darrell, Trevor
    Malik, Jitendra
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (01) : 142 - 158
  • [10] Nonparametric Variational Auto-encoders for Hierarchical Representation Learning
    Goyal, Prasoon
    Hu, Zhiting
    Liang, Xiaodan
    Wang, Chenyu
    Xing, Eric P.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5104 - 5112