SMFF-YOLO: A Scale-Adaptive YOLO Algorithm with Multi-Level Feature Fusion for Object Detection in UAV Scenes

被引:18
作者
Wang, Yuming [1 ,2 ]
Zou, Hua [1 ]
Yin, Ming [2 ]
Zhang, Xining [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Wuhan Text Univ, Sch Elect & Elect Engn, Wuhan 430077, Peoples R China
关键词
object detection; unmanned aerial vehicles; tiny objects; complex scenarios; multi-level feature information fusion; NETWORK;
D O I
10.3390/rs15184580
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Object detection in images captured by unmanned aerial vehicles (UAVs) holds great potential in various domains, including civilian applications, urban planning, and disaster response. However, it faces several challenges, such as multi-scale variations, dense scenes, complex backgrounds, and tiny-sized objects. In this paper, we present a novel scale-adaptive YOLO framework called SMFF-YOLO, which addresses these challenges through a multi-level feature fusion approach. To improve the detection accuracy of small objects, our framework incorporates the ELAN-SW object detection prediction head. This newly designed head effectively utilizes both global contextual information and local features, enhancing the detection accuracy of tiny objects. Additionally, the proposed bidirectional feature fusion pyramid (BFFP) module tackles the issue of scale variations in object sizes by aggregating multi-scale features. To handle complex backgrounds, we introduce the adaptive atrous spatial pyramid pooling (AASPP) module, which enables adaptive feature fusion and alleviates the negative impact of cluttered scenes. Moreover, we adopt the Wise-IoU(WIoU) bounding box regression loss to enhance the competitiveness of different quality anchor boxes, which offers the framework a more informed gradient allocation strategy. We validate the effectiveness of SMFF-YOLO using the VisDrone and UAVDT datasets. Experimental results demonstrate that our model achieves higher detection accuracy, with AP50 reaching 54.3% for VisDrone and 42.4% for UAVDT datasets. Visual comparative experiments with other YOLO-based methods further illustrate the robustness and adaptability of our approach.
引用
收藏
页数:23
相关论文
共 63 条
  • [1] Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934
  • [2] Cascade R-CNN: Delving into High Quality Object Detection
    Cai, Zhaowei
    Vasconcelos, Nuno
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6154 - 6162
  • [3] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [4] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [5] Chen YT, 2019, CHIN CONT DECIS CONF, P4610, DOI [10.1109/CCDC.2019.8832735, 10.1109/ccdc.2019.8832735]
  • [6] A Global-Local Self-Adaptive Network for Drone-View Object Detection
    Deng, Sutao
    Li, Shuai
    Xie, Ke
    Song, Wenfeng
    Liao, Xiao
    Hao, Aimin
    Qin, Hong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1556 - 1569
  • [7] The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking
    Du, Dawei
    Qi, Yuankai
    Yu, Hongyang
    Yang, Yifan
    Duan, Kaiwen
    Li, Guorong
    Zhang, Weigang
    Huang, Qingming
    Tian, Qi
    [J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 375 - 391
  • [8] Coarse-grained Density Map Guided Object Detection in Aerial Images
    Duan, Chengzhen
    Wei, Zhiwei
    Zhang, Chi
    Qu, Siying
    Wang, Hongpeng
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2789 - 2798
  • [9] The PASCAL Visual Object Classes Challenge: A Retrospective
    Everingham, Mark
    Eslami, S. M. Ali
    Van Gool, Luc
    Williams, Christopher K. I.
    Winn, John
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) : 98 - 136
  • [10] Ge Z, 2021, Arxiv, DOI [arXiv:2107.08430, 10.48550/arXiv.2107.08430, DOI 10.48550/ARXIV.2107.08430]