SRE-YOLOv8: An Improved UAV Object Detection Model Utilizing Swin Transformer and RE-FPN

被引:2
作者
Li, Jun [1 ,2 ]
Zhang, Jiajie [1 ,2 ]
Shao, Yanhua [3 ]
Liu, Feng [4 ,5 ]
机构
[1] Beijing Informat Sci & Technol Univ, Artificial Intelligence Secur Innovat Res, Beijing 100192, Peoples R China
[2] Beijing Informat Sci & Technol Univ, Dept Informat Secur, Beijing 100192, Peoples R China
[3] Natl Comp Syst Engn Res Inst China, Beijing 100083, Peoples R China
[4] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China
[5] East China Normal Univ, Shanghai Int Sch Chief Technol Officer, Shanghai 200062, Peoples R China
关键词
deep learning; object detection; YOLOv8; Swin Transformer; feature pyramid network; computational perception; FRAMEWORK;
D O I
10.3390/s24123918
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
To tackle the intricate challenges associated with the low detection accuracy of images taken by unmanned aerial vehicles (UAVs), arising from the diverse sizes and types of objects coupled with limited feature information, we present the SRE-YOLOv8 as an advanced method. Our method enhances the YOLOv8 object detection algorithm by leveraging the Swin Transformer and a lightweight residual feature pyramid network (RE-FPN) structure. Firstly, we introduce an optimized Swin Transformer module into the backbone network to preserve ample global contextual information during feature extraction and to extract a broader spectrum of features using self-attention mechanisms. Subsequently, we integrate a Residual Feature Augmentation (RFA) module and a lightweight attention mechanism named ECA, thereby transforming the original FPN structure to RE-FPN, intensifying the network's emphasis on critical features. Additionally, an SOD (small object detection) layer is incorporated to enhance the network's ability to recognize the spatial information of the model, thus augmenting accuracy in detecting small objects. Finally, we employ a Dynamic Head equipped with multiple attention mechanisms in the object detection head to enhance its performance in identifying low-resolution targets amidst complex backgrounds. Experimental evaluation conducted on the VisDrone2021 dataset reveals a significant advancement, showcasing an impressive 9.2% enhancement over the original YOLOv8 algorithm.
引用
收藏
页数:20
相关论文
共 52 条
  • [1] VFNet: A Convolutional Architecture for Accent Classification
    Ahmed, Asad
    Tangri, Pratham
    Panda, Anirban
    Ramani, Dhruv
    Nevronas, Samarjit Karmakar
    [J]. 2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,
  • [2] An improved faster-RCNN model for handwritten character recognition
    Albahli, Saleh
    Nawaz, Marriam
    Javed, Ali
    Irtaza, Aun
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (09) : 8509 - 8523
  • [3] Ba Jimmy Lei, 2016, arXiv
  • [4] [陈朋磊 Chen Penglei], 2023, [电子测量与仪器学报, Journal of Electronic Measurement and Instrument], V37, P183
  • [5] Dynamic Head: Unifying Object Detection Heads with Attentions
    Dai, Xiyang
    Chen, Yinpeng
    Xiao, Bin
    Chen, Dongdong
    Liu, Mengchen
    Yuan, Lu
    Zhang, Lei
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7369 - 7378
  • [6] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
  • [7] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [8] CenterNet: Keypoint Triplets for Object Detection
    Duan, Kaiwen
    Bai, Song
    Xie, Lingxi
    Qi, Honggang
    Huang, Qingming
    Tian, Qi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6568 - 6577
  • [9] HGNAS plus plus : Efficient Architecture Search for Heterogeneous Graph Neural Networks
    Gao, Yang
    Zhang, Peng
    Zhou, Chuan
    Yang, Hong
    Li, Zhao
    Hu, Yue
    Yu, Philip S. S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (09) : 9448 - 9461
  • [10] GraphNAS plus plus : Distributed Architecture Search for Graph Neural Networks
    Gao, Yang
    Zhang, Peng
    Yang, Hong
    Zhou, Chuan
    Hu, Yue
    Tian, Zhihong
    Li, Zhao
    Zhou, Jingren
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 6973 - 6987