Lightweight Small Object Detection Algorithm Based on STD-DETR

被引:0
作者
Yin, Zeyu [1 ]
Yang, Bo [2 ]
Chen, Jinling [1 ]
Zhu, Chuangchuang [1 ]
Chen, Hongli [3 ]
Tao, Jin [1 ]
机构
[1] Southwest Petr Univ, Sch Elect Engn & Informat, Chengdu 610500, Sichuan, Peoples R China
[2] State Grid Sichuan Informat & Telecommun Co, Chengdu 610095, Sichuan, Peoples R China
[3] Southwest Petr Univ, Petr Engn Sch, Chengdu 610500, Sichuan, Peoples R China
关键词
small object detection; real-time DEtection TRansformer; lightweight; feature pyramid; pixel intersection over union;
D O I
10.3788/LOP241849
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
To address the challenges of small target detection in aerial photography images by unmanned aerial vehicle, including complex background, tiny and dense targets, and difficulties in deploying models on mobile devices, this paper proposes an improved lightweight small target detection algorithm based on real-time DEtection TRansformer (RT-DETR) model, named STD-DETR. First, RepConv is introduced to improve the lightweight Starnet network, replacing the original backbone network, thereby achieving lightweight. A novel feature pyramid is then designed, incorporating a 160 pixel x 160 pixel feature map output at the P2 layer to enrich small target information. This approach replaces the traditional method of adding a P2 small target detection head, and introduces the CSP-ommiKernel-squeeze-excitation (COSE) module and space-to-depth (SPD) convolution to enhance the extraction of global features and the fusion of multi-scale features. Finally, pixel intersection over union (PIoU) is used to replace the original model's loss function, calculating IoU at the pixel level to more precisely capture small overlapping regions, reducing the miss rate and improving detection accuracy. Experimental results demonstrate that, compared with baseline model, the STD-DETR model achieves improvements of 1. 3 percentage points, 2. 2 percentage points, and 2. 3 percentage points in accuracy, recall, and mAP50 on the VisDrone2019 dataset, while reducing computational cost and parameters by similar to 34. 0% and similar to 37. 9%, respectively. Generalization tests on the Tinyperson dataset show increases of 3. 7 percentage points in accuracy and 3. 1 percentage points in mAP50, confirming the model's effectiveness and generalization capability.
引用
收藏
页数:11
相关论文
共 34 条
  • [1] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [2] Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
    Chen, Jierun
    Kao, Shiu-Hong
    He, Hao
    Zhuo, Weipeng
    Wen, Song
    Lee, Chul-Ho
    Chan, S. -H. Gary
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12021 - 12031
  • [3] Chen Z., 2020, Computer Vision-ECCV 2020, P195
  • [4] EdgeViT: Efficient Visual Modeling for Edge Computing
    Chen, Zekai
    Zhong, Fangtian
    Luo, Qi
    Zhang, Xiao
    Zheng, Yanwei
    [J]. WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, PT III, 2022, 13473 : 393 - 405
  • [5] Xception: Deep Learning with Depthwise Separable Convolutions
    Chollet, Francois
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
  • [6] Cui YN, 2024, AAAI CONF ARTIF INTE, P1426
  • [7] RepVGG: Making VGG-style ConvNets Great Again
    Ding, Xiaohan
    Zhang, Xiangyu
    Ma, Ningning
    Han, Jungong
    Ding, Guiguang
    Sun, Jian
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13728 - 13737
  • [8] VisDrone-SOT2019: The Vision Meets Drone Single Object Tracking Challenge Results
    Du, Dawei
    Zhu, Pengfei
    Wen, Longyin
    Bian, Xiao
    Ling, Haibin
    Hu, Qinghua
    Zheng, Jiayu
    Peng, Tao
    Wang, Xinyao
    Zhang, Yue
    Bo, Liefeng
    Shi, Hailin
    Zhu, Rui
    Han, Bo
    Zhang, Chunhui
    Liu, Guizhong
    Wu, Han
    Wen, Hao
    Wang, Haoran
    Fan, Jiaqing
    Chen, Jie
    Gao, Jie
    Zhang, Jie
    Zhou, Jinghao
    Zhou, Jinliu
    Wang, Jinwang
    Wan, Jiuqing
    Kittler, Josef
    Zhang, Kaihua
    Huang, Kaiqi
    Yang, Kang
    Zhang, Kangkai
    Huang, Lianghua
    Zhou, Lijun
    Shi, Lingling
    Ding, Lu
    Wang, Ning
    Wang, Peng
    Hu, Qintao
    Laganiere, Robert
    Ma, Ruiyan
    Zhang, Ruohan
    Zou, Shanrong
    Zhao, Shengwei
    Li, Shengyang
    Zhu, Shengyin
    Li, Shikun
    Ge, Shiming
    Xuan, Shiyu
    Xu, Tianyang
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 199 - 212
  • [9] Ge Z, 2021, Arxiv, DOI arXiv:2107.08430
  • [10] Gevorgyan Z, 2022, Arxiv, DOI [arXiv:2205.12740, DOI 10.48550/ARXIV.2205.12740, 10.48550/arXiv.2205.12740]