Detection Transformer with Multi-granularity Information

被引：0

作者：

Lin, Jiaxin ^{[1
]}

Yu, Yue ^{[2
]}

Zhao, Haifeng ^{[1
]}

Ma, Leilei ^{[1
]}

机构：

[1] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China

[2] Hefei Fengle Seed Co Ltd, Hefei 230601, Anhui, Peoples R China

来源：

PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON POWER ELECTRONICS AND ARTIFICIAL INTELLIGENCE, PEAI 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Computer vision; Object detection; Machine learning; Neural networks;

D O I：

10.1145/3674225.3674237

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent advancements in object detection have harnessed the power of DETR-like detectors, leveraging self-attention mechanisms to effectively capture long-range dependencies, and have drawn concern from researchers. However, these methods often fail to detect fine-grained object details, leading to the misidentification or omission of small objects, and ultimately making the model achieve sub-optimal performance. To address this challenge, we introduce a novel plug-and-play lightweight component, that can be flexibly integrated into the encoder layer, to extract fine-grained information and model local relationships. Specifically, Our proposed component comprises two modules: Multi-granularity Information Extraction and Multi-granularity Information Fusion. The information extraction employs dilated convolutions and large convolution kernels to capture multi-granularity features through multiple parallel branches. Subsequently, the information fusion utilizes self-attention guidance to effectively fuse information extracted at different granularities. Extensive experiments on the COCO benchmark dataset demonstrate that our proposed method outperforms the state-of-the-art (SOTA) method in terms of both accuracy and efficiency. Moreover, our method exhibits improvements when incorporated into DETR-like models. We achieve a 48.2 AP on the COCO detection test-dev using ResNet-DC-R101. Code will be available soon.

引用

页码：58 / 63

页数：6

共 18 条

[1] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[2] Dynamic DETR: End-to-End Object Detection with Dynamic Attention [J].

Dai, Xiyang ;

Chen, Yinpeng ;

Yang, Jianwei ;

Zhang, Pengchuan ;

Yuan, Lu ;

Zhang, Lei .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2968-2977

[3]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[4] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[5] Region-Based Convolutional Networks for Accurate Object Detection and Segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (01) :142-158

[6]

He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]

[7] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[8] Planning-oriented Autonomous Driving [J].

Hu, Yihan ;

Yang, Jiazhi ;

Chen, Li ;

Li, Keyu ;

Sima, Chonghao ;

Zhu, Xizhou ;

Chai, Siqi ;

Du, Senyao ;

Lin, Tianwei ;

Wang, Wenhai ;

Lu, Lewei ;

Jia, Xiaosong ;

Liu, Qiang ;

Dai, Jifeng ;

Qiao, Yu ;

Li, Hongyang .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :17853-17862

[9] The Hungarian Method for the assignment problem [J].

Kuhn, HW .

NAVAL RESEARCH LOGISTICS, 2005, 52 (01) :7-21

[10] DN-DETR: Accelerate DETR Training by Introducing Query DeNoising [J].

Li, Feng ;

Zhang, Hao ;

Liu, Shilong ;

Guo, Jian ;

Ni, Lionel M. ;

Zhang, Lei .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13609-13617

← 1 2 →