Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

被引：15

作者：

Zhang, Gongjie ^{[1
,2
]}

Luo, Zhipeng ^{[1
,3
]}

Tian, Zichen ^{[1
]}

Zhang, Jingyi ^{[1
]}

Zhang, Xiaoqin ^{[4
]}

Lu, Shijian ^{[1
]}

机构：

[1] Nanyang Technol Univ, S Lab, Singapore, Singapore

[2] Black Sesame Technol, Singapore, Singapore

[3] SenseTime Res, Hong Kong, Peoples R China

[4] Wenzhou Univ, Wenzhou, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00601

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) - a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.

引用

页码：6206 / 6216

页数：11

共 72 条

[1] [Anonymous], 2021, WACV, DOI DOI 10.1109/WACV48630.2021.00257
[2] [Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00656
[3] Bar Amir, 2022, CVPR
[4] Cao Xipeng, 2022, AAAI
[5] Carion N., 2020, P EUR C COMP VIS GLA, P213, DOI DOI 10.1007/978-3-030-58452-813
[6] Charge-depleting of the batteries makes smartphones recognizable
Chen, Jing
Fang, Yingying
He, Kun
Dui, Ruiying
[J]. 2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 33 - 40
[7] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
Dai, Zhigang
Cai, Bolun
Lin, Yugeng
Chen, Junying
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1601 - 1610
[8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9] Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector
Fan, Qi
Zhuo, Wei
Tang, Chi-Keung
Tai, Yu-Wing
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4012 - 4021
[10] Fang Y., 2021, NEURIPS

← 1 2 3 4 5 6 7 8 →