Revisiting the Sibling Head in Object Detector

被引：367

作者：

Song, Guanglu ^{[1
]}

Liu, Yu ^{[2
]}

Wang, Xiaogang ^{[2
]}

机构：

[1] SenseTime X Lab, Hong Kong, Peoples R China

[2] Chinese Univ Hong Kong, Hong Kong, Peoples R China

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年

关键词：

D O I：

10.1109/CVPR42600.2020.01158

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The "shared head for classification and localization" (sibling head), firstly denominated in Fast RCNN [9], has been leading the fashion of the object detection community in the past five years. This paper provides the observation that the spatial misalignment between the two object functions in the sibling head can considerably hurt the training process, but this misalignment can be resolved by a very simple operator called task-aware spatial disentanglement (TSD). Considering the classification and regression, TSD decouples them from the spatial dimension by generating two disentangled proposals for them, which are estimated by the shared proposal. This is inspired by the natural insight that for one instance, the features in some salient area may have rich information for classification while these around the boundary may be good at bounding box regression. Surprisingly, this simple design can boost all backbones and models on both MS COCO and Google OpenImage consistently by similar to 3% mAP. Further, we propose a progressive constraint to enlarge the performance margin between the disentangled and the shared proposals, and gain similar to 1% more mAP. We show the TSD breaks through the upper bound of nowadays single-model detector by a large margin (mAP 49.4 with ResNet-101, 51.2 with SENet154), and is the core model of our 1st place solution on the Google OpenImage Challenge 2019.

引用

页码：11560 / 11569

页数：10

共 43 条

[1]

[Anonymous], 2018, P EUR C COMP VIS ECC, DOI DOI 10.1007/978-3-030-01252-649

[2]

[Anonymous], 2019, J BIOTECHNOL 10

[3]

[Anonymous], ADV NEUR IN

[4] Soft-NMS - Improving Object Detection With One Line of Code [J].

Bodla, Navaneeth ;

Singh, Bharat ;

Chellappa, Rama ;

Davis, Larry S. .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5562-5570

[5]

Cao Y., 2019, arXiv

[6] MegDet: A Large Mini-Batch Object Detector [J].

Peng, Chao ;

Xiao, Tete ;

Li, Zeming ;

Jiang, Yuning ;

Zhang, Xiangyu ;

Jia, Kai ;

Yu, Gang ;

Sun, Jian .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6181-6189

[7]

Cheng B., 2018, EUR C COMP VIS ECCV

[8] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[9] CenterNet: Keypoint Triplets for Object Detection [J].

Duan, Kaiwen ;

Bai, Song ;

Xie, Lingxi ;

Qi, Honggang ;

Huang, Qingming ;

Tian, Qi .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6568-6577

[10] NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection [J].

Ghiasi, Golnaz ;

Lin, Tsung-Yi ;

Le, Quoc V. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7029-7038

← 1 2 3 4 5 →