Lightweight Small Object Detection Algorithm Based on STD-DETR

被引：0

作者：

Yin, Zeyu ^{[1
]}

Yang, Bo ^{[2
]}

Chen, Jinling ^{[1
]}

Zhu, Chuangchuang ^{[1
]}

Chen, Hongli ^{[3
]}

Tao, Jin ^{[1
]}

机构：

[1] Southwest Petr Univ, Sch Elect Engn & Informat, Chengdu 610500, Sichuan, Peoples R China

[2] State Grid Sichuan Informat & Telecommun Co, Chengdu 610095, Sichuan, Peoples R China

[3] Southwest Petr Univ, Petr Engn Sch, Chengdu 610500, Sichuan, Peoples R China

来源：

LASER & OPTOELECTRONICS PROGRESS | 2025年 / 62卷 / 08期

关键词：

small object detection; real-time DEtection TRansformer; lightweight; feature pyramid; pixel intersection over union;

D O I：

10.3788/LOP241849

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

To address the challenges of small target detection in aerial photography images by unmanned aerial vehicle, including complex background, tiny and dense targets, and difficulties in deploying models on mobile devices, this paper proposes an improved lightweight small target detection algorithm based on real-time DEtection TRansformer (RT-DETR) model, named STD-DETR. First, RepConv is introduced to improve the lightweight Starnet network, replacing the original backbone network, thereby achieving lightweight. A novel feature pyramid is then designed, incorporating a 160 pixel x 160 pixel feature map output at the P2 layer to enrich small target information. This approach replaces the traditional method of adding a P2 small target detection head, and introduces the CSP-ommiKernel-squeeze-excitation (COSE) module and space-to-depth (SPD) convolution to enhance the extraction of global features and the fusion of multi-scale features. Finally, pixel intersection over union (PIoU) is used to replace the original model's loss function, calculating IoU at the pixel level to more precisely capture small overlapping regions, reducing the miss rate and improving detection accuracy. Experimental results demonstrate that, compared with baseline model, the STD-DETR model achieves improvements of 1. 3 percentage points, 2. 2 percentage points, and 2. 3 percentage points in accuracy, recall, and mAP50 on the VisDrone2019 dataset, while reducing computational cost and parameters by similar to 34. 0% and similar to 37. 9%, respectively. Generalization tests on the Tinyperson dataset show increases of 3. 7 percentage points in accuracy and 3. 1 percentage points in mAP50, confirming the model's effectiveness and generalization capability.

引用

页数：11

共 34 条

[1] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[2] Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Chen, Jierun
Kao, Shiu-Hong
He, Hao
Zhuo, Weipeng
Wen, Song
Lee, Chul-Ho
Chan, S. -H. Gary
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12021 - 12031
[3] Chen Z., 2020, Computer Vision-ECCV 2020, P195
[4] EdgeViT: Efficient Visual Modeling for Edge Computing
Chen, Zekai
Zhong, Fangtian
Luo, Qi
Zhang, Xiao
Zheng, Yanwei
[J]. WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, PT III, 2022, 13473 : 393 - 405
[5] Xception: Deep Learning with Depthwise Separable Convolutions
Chollet, Francois
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
[6] Cui YN, 2024, AAAI CONF ARTIF INTE, P1426
[7] RepVGG: Making VGG-style ConvNets Great Again
Ding, Xiaohan
Zhang, Xiangyu
Ma, Ningning
Han, Jungong
Ding, Guiguang
Sun, Jian
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13728 - 13737
[8] VisDrone-SOT2019: The Vision Meets Drone Single Object Tracking Challenge Results
Du, Dawei
Zhu, Pengfei
Wen, Longyin
Bian, Xiao
Ling, Haibin
Hu, Qinghua
Zheng, Jiayu
Peng, Tao
Wang, Xinyao
Zhang, Yue
Bo, Liefeng
Shi, Hailin
Zhu, Rui
Han, Bo
Zhang, Chunhui
Liu, Guizhong
Wu, Han
Wen, Hao
Wang, Haoran
Fan, Jiaqing
Chen, Jie
Gao, Jie
Zhang, Jie
Zhou, Jinghao
Zhou, Jinliu
Wang, Jinwang
Wan, Jiuqing
Kittler, Josef
Zhang, Kaihua
Huang, Kaiqi
Yang, Kang
Zhang, Kangkai
Huang, Lianghua
Zhou, Lijun
Shi, Lingling
Ding, Lu
Wang, Ning
Wang, Peng
Hu, Qintao
Laganiere, Robert
Ma, Ruiyan
Zhang, Ruohan
Zou, Shanrong
Zhao, Shengwei
Li, Shengyang
Zhu, Shengyin
Li, Shikun
Ge, Shiming
Xuan, Shiyu
Xu, Tianyang
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 199 - 212
[9] Ge Z, 2021, Arxiv, DOI arXiv:2107.08430
[10] Gevorgyan Z, 2022, Arxiv, DOI [arXiv:2205.12740, DOI 10.48550/ARXIV.2205.12740, 10.48550/arXiv.2205.12740]

← 1 2 3 4 →