RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection

被引：13

作者：

Zong, Zhuofan ^{[1
]}

Cao, Qianggang ^{[1
]}

Leng, Biao ^{[1
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

基金：

中国国家自然科学基金;

关键词：

Feature Pyramid Networks; Multi-scale Feature Fusion; Object Detection;

D O I：

10.1145/3474085.3475708

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Feature pyramid networks (FPN) are widely exploited for multiscale feature fusion in existing advanced object detection frameworks. Numerous previous works have developed various structures for bidirectional feature fusion, all of which are shown to improve the detection performance effectively. We observe that these complicated network structures require feature pyramids to be stacked in a fixed order, which introduces longer pipelines and reduces the inference speed. Moreover, semantics from non-adjacent levels are diluted in the feature pyramid since only features at adjacent pyramid levels are merged by the local fusion operation in a sequence manner. To address these issues, we propose a novel architecture named RCNet, which consists of Reverse Feature Pyramid (RevFP) and Cross-scale Shift Network (CSN). RevFP utilizes local bidirectional feature fusion to simplify the bidirectional pyramid inference pipeline. CSN directly propagates representations to both adjacent and non-adjacent levels to enable multi-scale features more correlative. Extensive experiments on the MS COCO dataset demonstrate RCNet can consistently bring significant improvements over both one-stage and two-stage detectors with subtle extra computational overhead. In particular, RetinaNet is boosted to 40.2 AP, which is 3.7 points higher than baseline, by replacing FPN with our proposed model. On COCO test-dev, RCNet can achieve very competitive performance with a single-model single-scale 50.5 AP.

引用

页码：5637 / 5645

页数：9

共 46 条

[1] Chaoxu Guo, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P12592, DOI 10.1109/CVPR42600.2020.01261
[2] Hybrid Task Cascade for Instance Segmentation
Chen, Kai
Pang, Jiangmiao
Wang, Jiaqi
Xiong, Yu
Li, Xiaoxiao
Sun, Shuyang
Feng, Wansen
Liu, Ziwei
Shi, Jianping
Ouyang, Wanli
Loy, Chen Change
Lin, Dahua
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4969 - 4978
[3] Chen Kai, 2019, arXiv preprint arXiv:1906.07155
[4] All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification
Chen, Weijie
Xie, Di
Zhang, Yuan
Pu, Shiliang
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7234 - 7243
[5] Chen Y., 2020, ADV NEURAL INF PROCE, V33, P5621
[6] Chenchen Zhu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12354), P91, DOI 10.1007/978-3-030-58545-7_6
[7] Deformable Convolutional Networks
Dai, Jifeng
Qi, Haozhi
Xiong, Yuwen
Li, Yi
Zhang, Guodong
Hu, Han
Wei, Yichen
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 764 - 773
[8] Dong Z, 2020, P IEEE CVF C COMP VI, P10519, DOI DOI 10.1109/CVPR42600.2020.01053
[9] CenterNet: Keypoint Triplets for Object Detection
Duan, Kaiwen
Bai, Song
Xie, Lingxi
Qi, Honggang
Huang, Qingming
Tian, Qi
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6568 - 6577
[10] NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
Ghiasi, Golnaz
Lin, Tsung-Yi
Le, Quoc V.
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7029 - 7038

← 1 2 3 4 5 →