Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

被引:15
作者
Zhang, Gongjie [1 ,2 ]
Luo, Zhipeng [1 ,3 ]
Tian, Zichen [1 ]
Zhang, Jingyi [1 ]
Zhang, Xiaoqin [4 ]
Lu, Shijian [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
[2] Black Sesame Technol, Singapore, Singapore
[3] SenseTime Res, Hong Kong, Peoples R China
[4] Wenzhou Univ, Wenzhou, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
关键词
D O I
10.1109/CVPR52729.2023.00601
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) - a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.
引用
收藏
页码:6206 / 6216
页数:11
相关论文
共 72 条
  • [1] [Anonymous], 2021, WACV, DOI DOI 10.1109/WACV48630.2021.00257
  • [2] [Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00656
  • [3] Bar Amir, 2022, CVPR
  • [4] Cao Xipeng, 2022, AAAI
  • [5] Carion N., 2020, P EUR C COMP VIS GLA, P213, DOI DOI 10.1007/978-3-030-58452-813
  • [6] Charge-depleting of the batteries makes smartphones recognizable
    Chen, Jing
    Fang, Yingying
    He, Kun
    Dui, Ruiying
    [J]. 2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 33 - 40
  • [7] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1601 - 1610
  • [8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [9] Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector
    Fan, Qi
    Zhuo, Wei
    Tang, Chi-Keung
    Tai, Yu-Wing
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4012 - 4021
  • [10] Fang Y., 2021, NEURIPS