RT-DEKT: real-time object detector with KAN-Transformer

被引：0

作者：

Jin, Zhanao ^{[1
]}

Li, Changlu ^{[2
]}

Lei, Zhichun ^{[3
]}

机构：

[1] Tianjin Univ, Sch Microelect, Tianjin 300072, Peoples R China

[2] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[3] Univ Appl Sci Ruhr West, Inst Measurement Engn & Sensor Technol, D-45479 Mulheim, Germany

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2025年 / 19卷 / 06期

关键词：

Object detection; Vision transformer; Attention mechanism; Kolmogorov-Arnold networks;

D O I：

10.1007/s11760-025-04016-8

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Real-Time Detection Transformer (RT-DETR) is the first Transformer-based detector extended to real-time object detection scenarios. Despite its remarkable performance in both accuracy and real-time inference, the backbone lacks adequate attention to feature channels and spatial dimensions, resulting in suboptimal feature quality. To address this limitation, this paper proposes a hybrid backbone with spatial and channel attention mechanism to enhance the feature extraction capabilities of the backbone. Furthermore, the original Transformer encoder has insufficient processing capabilities for high-dimensional complex image data and poor interpretability. To overcome these limitations, this paper introduces a high-level KAN-Transformer encoder, which utilizes Kolmogorov-Arnold Networks that outperform traditional Multi-Layer Perceptrons. The proposed object detector is named RT-DEKT, with variants based on different backbone architectures: RT-DEKT-R50 and RT-DEKT-R101. Experiments on the COCO dataset demonstrate that RT-DEKT-R50 achieves 53.6% AP and 103 FPS on an NVIDIA T4 GPU, while RT-DEKT-R101 achieves 54.8% AP and 68 FPS. After pretraining on the Objects365 dataset, RT-DEKT-R50 improves to 56.8% AP, and RT-DEKT-R101 further enhances to 58.5% AP. Both variants outperform contemporary detectors in accuracy and real-time performance, offering an effective trade-off between efficiency and accuracy.

引用

页数：11

共 42 条

[1]

AlImran A., 2024, arXiv e-prints, P2408

[2] Anomaly Detection in Autonomous Driving: A Survey [J].

Bogdoll, Daniel ;

Nitsche, Maximilian ;

Zoellner, J. Marius .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, :4487-4498

[3] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[4]

Chang JL, 2018, ADV NEUR IN, V31

[5]

Chen Q, 2023, IEEE I CONF COMP VIS, P6610, DOI [10.1109/ICCV51070.2023.00610, 10.1109/ICCV51070.2023.0061]

[6]

Cunningham H, 2023, Arxiv, DOI arXiv:2309.08600

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[9] Broad and deep neural network for high-dimensional data representation learning [J].

Feng, Qiying ;

Liu, Zhulin ;

Chen, C. L. Philip .

INFORMATION SCIENCES, 2022, 599 :127-146

[10] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

← 1 2 3 4 5 →