Underwater object detection by integrating YOLOv8 and efficient transformer

被引:1
作者
Liu, Jing [1 ]
Sun, Kaiqiong [1 ]
Ye, Xiao [1 ]
Yun, Yaokun [1 ]
机构
[1] Wuhan Polytech Univ, Sch Math & Comp Sci, Wuhan, Peoples R China
关键词
underwater object detection; transformer; YOLOv8; attention mechanism; bi-directional feature pyramid network;
D O I
10.1117/1.JEI.33.4.043011
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, underwater target detection algorithms based on deep learning have greatly promoted the development of the field of marine science and underwater robotics. However, due to the complexity of the underwater environment, there are problems, such as target occlusion, overlap, background confusion, and small object, that lead to detection difficulties. To address this issue, this paper proposes an improved underwater target detection method based on YOLOv8s. First, a lightweight backbone network with efficient transformers is used to replace the original backbone network, which enhances the contextual feature extraction capability. Second, an improved bidirectional feature pyramid network is used in the later multi-scale fusion part by increasing the input of bottom-level information while reducing the model size and number of parameters. Finally, a dynamic head with an attention mechanism is introduced into the detection head to enhance the classification and localization of small and fuzzy targets. Experimental results show that the proposed method improves the mAP0.5:0.95 of 65.7%, 63.7%, and 51.2% with YOLOv8s to that of 69.2%, 66.8%, and 54.8%, on three public underwater datasets, DUO, RUOD, and URPC2020, respectively. Additionally, compared with the YOLOv8s model, the model size decreased from 21.46 to 15.56 MB, and the number of parameters decreased from 11.1 to 7.9 M. (c) 2024 SPIE and IS&T
引用
收藏
页数:15
相关论文
共 37 条
[1]  
Bochkovskiy A, 2020, PREPRINT, DOI 10.48550/ARXIV.2004.10934
[2]   Dynamic ReLU [J].
Chen, Yinpeng ;
Dai, Xiyang ;
Liu, Mengchen ;
Chen, Dongdong ;
Yuan, Lu ;
Liu, Zicheng .
COMPUTER VISION - ECCV 2020, PT XIX, 2020, 12364 :351-367
[3]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[4]   Lightweight Transformers make strong encoders for underwater object detection [J].
Cui, Jinrong ;
Liu, Hailong ;
Zhong, Haowei ;
Huang, Cheng ;
Zhang, Weifeng .
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) :1889-1896
[5]   Dynamic Head: Unifying Object Detection Heads with Attentions [J].
Dai, Xiyang ;
Chen, Yinpeng ;
Xiao, Bin ;
Chen, Dongdong ;
Liu, Mengchen ;
Yuan, Lu ;
Zhang, Lei .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7369-7378
[6]  
Dosovitskiy Alexey., 2020, words: Transformers for image recognition at scale, P2020, DOI 10.48550/arXiv.2010.11929
[7]   TOOD: Task-aligned One-stage Object Detection [J].
Feng, Chengjian ;
Zhong, Yujie ;
Gao, Yu ;
Scott, Matthew R. ;
Huang, Weilin .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :3490-3499
[8]   Rethinking general underwater object detection: Datasets, challenges, and solutions [J].
Fu, Chenping ;
Liu, Risheng ;
Fan, Xin ;
Chen, Puyang ;
Fu, Hao ;
Yuan, Wanqi ;
Zhu, Ming ;
Luo, Zhongxuan .
NEUROCOMPUTING, 2023, 517 :243-256
[9]  
Ge Z, 2021, Arxiv, DOI arXiv:2107.08430
[10]   A lightweight YOLOv8 integrating FasterNet for real-time underwater object detection [J].
Guo, An ;
Sun, Kaiqiong ;
Zhang, Ziyi .
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (02)