YOLO-ESFM: A multi-scale YOLO algorithm for sea surface object detection☆

被引:0
作者
Wei, Maochun [1 ]
Chen, Keyu [2 ]
Yan, Fei [2 ]
Ma, Jikang [2 ]
Liu, Kaiming [2 ]
Cheng, En [2 ]
机构
[1] Xiamen Ocean Vocat Coll, Xiamen Lab Intelligent Fishery, Xiamen, Fujian, Peoples R China
[2] Xiamen Univ, Key Lab Underwater Acoust Commun & Marine Informat, Minist Educ, Xiamen 361000, Peoples R China
基金
中国国家自然科学基金;
关键词
Scale fusion; YOLO; Deep learning; Object detection; Ocean;
D O I
10.1016/j.ijnaoe.2025.100651
中图分类号
U6 [水路运输]; P75 [海洋工程];
学科分类号
0814 ; 081505 ; 0824 ; 082401 ;
摘要
Environmental perception and object detection are pivotal research topics in the marine domain. The sea surface presents unique challenges, including harsh weather conditions, wave interference, and multi-scale targets, often resulting in suboptimal detection results. To address these issues, we present an innovative solution: the integration of the Efficient Scale Fusion Module (ESFM) into the advanced YOLO architecture, resulting in the enhanced model, YOLO-ESFM. The ESFM serves as both the backbone and detection head of the network, significantly improving performance compared to the baseline models in YOLOv5s, YOLOv7tiny, and YOLOv7. Furthermore, to tackle the limitations of the CIOU in YOLOv7, we introduce an improved method, ZIOU, which has been rigorously evaluated and proven effective on the Sea Surface Target Dataset. Comparative studies demonstrate that YOLO-ESFM not only maintains efficiency in terms of parameters and FLOPs but also surpasses YOLOv7 in detection accuracy on both the Sea Surface Target Dataset and the PASCAL VOC 07+12 Dataset.
引用
收藏
页数:9
相关论文
共 39 条
  • [1] [Anonymous], 2012, VOC2012 RESULTS
  • [2] [Anonymous], 2007, VOC VOC2007 WORKSH I
  • [3] Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, 10.48550/arXiv.2004.10934, DOI 10.48550/ARXIV.2004.10934]
  • [4] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [5] Dai JF, 2016, ADV NEUR IN, V29
  • [6] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [7] Res2Net: A New Multi-Scale Backbone Architecture
    Gao, Shang-Hua
    Cheng, Ming-Ming
    Zhao, Kai
    Zhang, Xin-Yu
    Yang, Ming-Hsuan
    Torr, Philip
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) : 652 - 662
  • [8] Ge Z., 2021, arXiv, DOI arXiv:2107.08430
  • [9] Fast R-CNN
    Girshick, Ross
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
  • [10] Rich feature hierarchies for accurate object detection and semantic segmentation
    Girshick, Ross
    Donahue, Jeff
    Darrell, Trevor
    Malik, Jitendra
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587