Detection Transformer with Multi-granularity Information

被引:0
作者
Lin, Jiaxin [1 ]
Yu, Yue [2 ]
Zhao, Haifeng [1 ]
Ma, Leilei [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China
[2] Hefei Fengle Seed Co Ltd, Hefei 230601, Anhui, Peoples R China
来源
PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON POWER ELECTRONICS AND ARTIFICIAL INTELLIGENCE, PEAI 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Computer vision; Object detection; Machine learning; Neural networks;
D O I
10.1145/3674225.3674237
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advancements in object detection have harnessed the power of DETR-like detectors, leveraging self-attention mechanisms to effectively capture long-range dependencies, and have drawn concern from researchers. However, these methods often fail to detect fine-grained object details, leading to the misidentification or omission of small objects, and ultimately making the model achieve sub-optimal performance. To address this challenge, we introduce a novel plug-and-play lightweight component, that can be flexibly integrated into the encoder layer, to extract fine-grained information and model local relationships. Specifically, Our proposed component comprises two modules: Multi-granularity Information Extraction and Multi-granularity Information Fusion. The information extraction employs dilated convolutions and large convolution kernels to capture multi-granularity features through multiple parallel branches. Subsequently, the information fusion utilizes self-attention guidance to effectively fuse information extracted at different granularities. Extensive experiments on the COCO benchmark dataset demonstrate that our proposed method outperforms the state-of-the-art (SOTA) method in terms of both accuracy and efficiency. Moreover, our method exhibits improvements when incorporated into DETR-like models. We achieve a 48.2 AP on the COCO detection test-dev using ResNet-DC-R101. Code will be available soon.
引用
收藏
页码:58 / 63
页数:6
相关论文
共 18 条
[1]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[2]   Dynamic DETR: End-to-End Object Detection with Dynamic Attention [J].
Dai, Xiyang ;
Chen, Yinpeng ;
Yang, Jianwei ;
Zhang, Pengchuan ;
Yuan, Lu ;
Zhang, Lei .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2968-2977
[3]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[4]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[5]   Region-Based Convolutional Networks for Accurate Object Detection and Segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (01) :142-158
[6]  
He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[7]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[8]   Planning-oriented Autonomous Driving [J].
Hu, Yihan ;
Yang, Jiazhi ;
Chen, Li ;
Li, Keyu ;
Sima, Chonghao ;
Zhu, Xizhou ;
Chai, Siqi ;
Du, Senyao ;
Lin, Tianwei ;
Wang, Wenhai ;
Lu, Lewei ;
Jia, Xiaosong ;
Liu, Qiang ;
Dai, Jifeng ;
Qiao, Yu ;
Li, Hongyang .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :17853-17862
[9]   The Hungarian Method for the assignment problem [J].
Kuhn, HW .
NAVAL RESEARCH LOGISTICS, 2005, 52 (01) :7-21
[10]   DN-DETR: Accelerate DETR Training by Introducing Query DeNoising [J].
Li, Feng ;
Zhang, Hao ;
Liu, Shilong ;
Guo, Jian ;
Ni, Lionel M. ;
Zhang, Lei .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13609-13617