YOLO-ESFM: A multi-scale YOLO algorithm for sea surface object detection☆

被引：0

作者：

Wei, Maochun ^{[1
]}

Chen, Keyu ^{[2
]}

Yan, Fei ^{[2
]}

Ma, Jikang ^{[2
]}

Liu, Kaiming ^{[2
]}

Cheng, En ^{[2
]}

机构：

[1] Xiamen Ocean Vocat Coll, Xiamen Lab Intelligent Fishery, Xiamen, Fujian, Peoples R China

[2] Xiamen Univ, Key Lab Underwater Acoust Commun & Marine Informat, Minist Educ, Xiamen 361000, Peoples R China

来源：

INTERNATIONAL JOURNAL OF NAVAL ARCHITECTURE AND OCEAN ENGINEERING | 2025年 / 17卷

基金：

中国国家自然科学基金;

关键词：

Scale fusion; YOLO; Deep learning; Object detection; Ocean;

D O I：

10.1016/j.ijnaoe.2025.100651

中图分类号：

U6 [水路运输]; P75 [海洋工程];

学科分类号：

0814 ; 081505 ; 0824 ; 082401 ;

摘要：

Environmental perception and object detection are pivotal research topics in the marine domain. The sea surface presents unique challenges, including harsh weather conditions, wave interference, and multi-scale targets, often resulting in suboptimal detection results. To address these issues, we present an innovative solution: the integration of the Efficient Scale Fusion Module (ESFM) into the advanced YOLO architecture, resulting in the enhanced model, YOLO-ESFM. The ESFM serves as both the backbone and detection head of the network, significantly improving performance compared to the baseline models in YOLOv5s, YOLOv7tiny, and YOLOv7. Furthermore, to tackle the limitations of the CIOU in YOLOv7, we introduce an improved method, ZIOU, which has been rigorously evaluated and proven effective on the Sea Surface Target Dataset. Comparative studies demonstrate that YOLO-ESFM not only maintains efficiency in terms of parameters and FLOPs but also surpasses YOLOv7 in detection accuracy on both the Sea Surface Target Dataset and the PASCAL VOC 07+12 Dataset.

引用

页数：9

共 39 条

[1] [Anonymous], 2012, VOC2012 RESULTS
[2] [Anonymous], 2007, VOC VOC2007 WORKSH I
[3] Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, 10.48550/arXiv.2004.10934, DOI 10.48550/ARXIV.2004.10934]
[4] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[5] Dai JF, 2016, ADV NEUR IN, V29
[6] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7] Res2Net: A New Multi-Scale Backbone Architecture
Gao, Shang-Hua
Cheng, Ming-Ming
Zhao, Kai
Zhang, Xin-Yu
Yang, Ming-Hsuan
Torr, Philip
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) : 652 - 662
[8] Ge Z., 2021, arXiv, DOI arXiv:2107.08430
[9] Fast R-CNN
Girshick, Ross
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
[10] Rich feature hierarchies for accurate object detection and semantic segmentation
Girshick, Ross
Donahue, Jeff
Darrell, Trevor
Malik, Jitendra
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587

← 1 2 3 4 →