RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection

被引:13
作者
Zong, Zhuofan [1 ]
Cao, Qianggang [1 ]
Leng, Biao [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年
基金
中国国家自然科学基金;
关键词
Feature Pyramid Networks; Multi-scale Feature Fusion; Object Detection;
D O I
10.1145/3474085.3475708
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature pyramid networks (FPN) are widely exploited for multiscale feature fusion in existing advanced object detection frameworks. Numerous previous works have developed various structures for bidirectional feature fusion, all of which are shown to improve the detection performance effectively. We observe that these complicated network structures require feature pyramids to be stacked in a fixed order, which introduces longer pipelines and reduces the inference speed. Moreover, semantics from non-adjacent levels are diluted in the feature pyramid since only features at adjacent pyramid levels are merged by the local fusion operation in a sequence manner. To address these issues, we propose a novel architecture named RCNet, which consists of Reverse Feature Pyramid (RevFP) and Cross-scale Shift Network (CSN). RevFP utilizes local bidirectional feature fusion to simplify the bidirectional pyramid inference pipeline. CSN directly propagates representations to both adjacent and non-adjacent levels to enable multi-scale features more correlative. Extensive experiments on the MS COCO dataset demonstrate RCNet can consistently bring significant improvements over both one-stage and two-stage detectors with subtle extra computational overhead. In particular, RetinaNet is boosted to 40.2 AP, which is 3.7 points higher than baseline, by replacing FPN with our proposed model. On COCO test-dev, RCNet can achieve very competitive performance with a single-model single-scale 50.5 AP.
引用
收藏
页码:5637 / 5645
页数:9
相关论文
共 46 条
  • [1] Chaoxu Guo, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P12592, DOI 10.1109/CVPR42600.2020.01261
  • [2] Hybrid Task Cascade for Instance Segmentation
    Chen, Kai
    Pang, Jiangmiao
    Wang, Jiaqi
    Xiong, Yu
    Li, Xiaoxiao
    Sun, Shuyang
    Feng, Wansen
    Liu, Ziwei
    Shi, Jianping
    Ouyang, Wanli
    Loy, Chen Change
    Lin, Dahua
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4969 - 4978
  • [3] Chen Kai, 2019, arXiv preprint arXiv:1906.07155
  • [4] All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification
    Chen, Weijie
    Xie, Di
    Zhang, Yuan
    Pu, Shiliang
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7234 - 7243
  • [5] Chen Y., 2020, ADV NEURAL INF PROCE, V33, P5621
  • [6] Chenchen Zhu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12354), P91, DOI 10.1007/978-3-030-58545-7_6
  • [7] Deformable Convolutional Networks
    Dai, Jifeng
    Qi, Haozhi
    Xiong, Yuwen
    Li, Yi
    Zhang, Guodong
    Hu, Han
    Wei, Yichen
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 764 - 773
  • [8] Dong Z, 2020, P IEEE CVF C COMP VI, P10519, DOI DOI 10.1109/CVPR42600.2020.01053
  • [9] CenterNet: Keypoint Triplets for Object Detection
    Duan, Kaiwen
    Bai, Song
    Xie, Lingxi
    Qi, Honggang
    Huang, Qingming
    Tian, Qi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6568 - 6577
  • [10] NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
    Ghiasi, Golnaz
    Lin, Tsung-Yi
    Le, Quoc V.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7029 - 7038