RT-DEKT: real-time object detector with KAN-Transformer

被引:0
作者
Jin, Zhanao [1 ]
Li, Changlu [2 ]
Lei, Zhichun [3 ]
机构
[1] Tianjin Univ, Sch Microelect, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[3] Univ Appl Sci Ruhr West, Inst Measurement Engn & Sensor Technol, D-45479 Mulheim, Germany
关键词
Object detection; Vision transformer; Attention mechanism; Kolmogorov-Arnold networks;
D O I
10.1007/s11760-025-04016-8
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Real-Time Detection Transformer (RT-DETR) is the first Transformer-based detector extended to real-time object detection scenarios. Despite its remarkable performance in both accuracy and real-time inference, the backbone lacks adequate attention to feature channels and spatial dimensions, resulting in suboptimal feature quality. To address this limitation, this paper proposes a hybrid backbone with spatial and channel attention mechanism to enhance the feature extraction capabilities of the backbone. Furthermore, the original Transformer encoder has insufficient processing capabilities for high-dimensional complex image data and poor interpretability. To overcome these limitations, this paper introduces a high-level KAN-Transformer encoder, which utilizes Kolmogorov-Arnold Networks that outperform traditional Multi-Layer Perceptrons. The proposed object detector is named RT-DEKT, with variants based on different backbone architectures: RT-DEKT-R50 and RT-DEKT-R101. Experiments on the COCO dataset demonstrate that RT-DEKT-R50 achieves 53.6% AP and 103 FPS on an NVIDIA T4 GPU, while RT-DEKT-R101 achieves 54.8% AP and 68 FPS. After pretraining on the Objects365 dataset, RT-DEKT-R50 improves to 56.8% AP, and RT-DEKT-R101 further enhances to 58.5% AP. Both variants outperform contemporary detectors in accuracy and real-time performance, offering an effective trade-off between efficiency and accuracy.
引用
收藏
页数:11
相关论文
共 42 条
[1]  
AlImran A., 2024, arXiv e-prints, P2408
[2]   Anomaly Detection in Autonomous Driving: A Survey [J].
Bogdoll, Daniel ;
Nitsche, Maximilian ;
Zoellner, J. Marius .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, :4487-4498
[3]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[4]  
Chang JL, 2018, ADV NEUR IN, V31
[5]  
Chen Q, 2023, IEEE I CONF COMP VIS, P6610, DOI [10.1109/ICCV51070.2023.00610, 10.1109/ICCV51070.2023.0061]
[6]  
Cunningham H, 2023, Arxiv, DOI arXiv:2309.08600
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[9]   Broad and deep neural network for high-dimensional data representation learning [J].
Feng, Qiying ;
Liu, Zhulin ;
Chen, C. L. Philip .
INFORMATION SCIENCES, 2022, 599 :127-146
[10]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587