CrabNet: Fully Task-Specific Feature Learning for One-Stage Object Detection

被引：19

作者：

Wang, Hao ^{[1
]}

Wang, Qilong ^{[2
]}

Zhang, Hongzhi ^{[3
]}

Hu, Qinghua ^{[2
]}

Zuo, Wangmeng ^{[3
]}

机构：

[1] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Peoples R China

[2] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300072, Peoples R China

[3] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2022年 / 31卷

关键词：

Location awareness; Task analysis; Feature extraction; Object detection; Representation learning; Detectors; Proposals; convolutional network; feature disentanglement; feature interaction; REPRESENTATION;

D O I：

10.1109/TIP.2022.3162099

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object detection is usually solved by learning a deep architecture involving classification and localization tasks, where feature learning for these two tasks is shared using the same backbone model. Recent works have shown that suitable disentanglement of classification and localization tasks has the great potential to improve performance of object detection. Despite the promising performance, existing feature disentanglement methods usually suffer from two limitations. First, most of them only focus on the disentangled proposals or predication heads for classification and localization tasks after RPN. While little consideration has been given to that the features for these two different tasks actually are obtained by a shared backbone model before RPN. Second, they are suggested for two-stage objectors and are not applicable to one-stage methods. To overcome these limitations, this paper presents a novel fully task-specific feature learning method for one-stage object detection. Specifically, our method first learns disentangled features for classification and localization tasks using two separated backbone models, where auxiliary classification and localization heads are inserted at the end of the two backbone models for providing a fully task-specific features for classification and localization. Then, a feature interaction module is developed for aligning and fusing task-specific features, which are further used to produce the final detection result. Experiments on MS COCO show that our proposed method (dubbed CrabNet) can achieve clear improvement over counterparts with increasing limited inference time, while performing favorably against state-of-the-arts.

引用

页码：2962 / 2974

页数：13

共 60 条

[1]

[Anonymous], 2006, VISUAL BRAIN ACTION

[2]

[Anonymous], 2016, P NIPS

[3] Mask R-CNN [J].

He, Kaiming ;

Gkioxari, Georgia ;

Dollar, Piotr ;

Girshick, Ross .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2980-2988

[4] Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [J].

Bell, Sean ;

Zitnick, C. Lawrence ;

Bala, Kavita ;

Girshick, Ross .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2874-2883

[5]

Bello I, 2021, ADV NEUR IN, V34

[6]

Chen Z, 2018, PR MACH LEARN RES, V80

[7]

Chenchen Zhu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12354), P91, DOI 10.1007/978-3-030-58545-7_6

[8] Revisiting RCNN: On Awakening the Classification Power of Faster RCNN [J].

Cheng, Bowen ;

Wei, Yunchao ;

Shi, Honghui ;

Feris, Rogerio ;

Xiong, Jinjun ;

Huang, Thomas .

COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 :473-490

[9] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[10]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

← 1 2 3 4 5 6 →