Lightweight Underwater Object Detection Based on YOLO v4 and Multi-Scale Attentional Feature Fusion

被引:147
作者
Zhang, Minghua [1 ]
Xu, Shubo [1 ]
Song, Wei [1 ]
He, Qi [1 ]
Wei, Quanmiao [2 ]
机构
[1] Shanghai Ocean Univ, Coll Informat Technol, Shanghai 201306, Peoples R China
[2] Minist Nat Resources, East China Sea Bur, Shanghai 200137, Peoples R China
基金
中国国家自然科学基金;
关键词
YOLO; lightweight network; underwater object detection; attention mechanism;
D O I
10.3390/rs13224706
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
A challenging and attractive task in computer vision is underwater object detection. Although object detection techniques have achieved good performance in general datasets, problems of low visibility and color bias in the complex underwater environment have led to generally poor image quality; besides this, problems with small targets and target aggregation have led to less extractable information, which makes it difficult to achieve satisfactory results. In past research of underwater object detection based on deep learning, most studies have mainly focused on improving detection accuracy by using large networks; the problem of marine underwater lightweight object detection has rarely gotten attention, which has resulted in a large model size and slow detection speed; as such the application of object detection technologies under marine environments needs better real-time and lightweight performance. In view of this, a lightweight underwater object detection method based on the MobileNet v2, You Only Look Once (YOLO) v4 algorithm and attentional feature fusion has been proposed to address this problem, to produce a harmonious balance between accuracy and speediness for target detection in marine environments. In our work, a combination of MobileNet v2 and depth-wise separable convolution is proposed to reduce the number of model parameters and the size of the model. The Modified Attentional Feature Fusion (AFFM) module aims to better fuse semantic and scale-inconsistent features and to improve accuracy. Experiments indicate that the proposed method obtained a mean average precision (mAP) of 81.67% and 92.65% on the PASCAL VOC dataset and the brackish dataset, respectively, and reached a processing speed of 44.22 frame per second (FPS) on the brackish dataset. Moreover, the number of model parameters and the model size were compressed to 16.76% and 19.53% of YOLO v4, respectively, which achieved a good tradeoff between time and accuracy for underwater object detection.
引用
收藏
页数:22
相关论文
共 51 条
[21]  
King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001
[22]  
Kisantal Mate, 2019, AUGMENTATION SMALL O
[23]   RON: Reverse Connection with Objectness Prior Networks for Object Detection [J].
Kong, Tao ;
Sun, Fuchun ;
Yao, Anbang ;
Liu, Huaping ;
Lu, Ming ;
Chen, Yurong .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5244-5252
[24]   Perceptual Generative Adversarial Networks for Small Object Detection [J].
Li, Jianan ;
Liang, Xiaodan ;
Wei, Yunchao ;
Xu, Tingfa ;
Feng, Jiashi ;
Yan, Shuicheng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1951-1959
[25]  
Li Yuxi, 2018, 180711013 ARXIV
[26]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755
[27]   Feature Pyramid Networks for Object Detection [J].
Lin, Tsung-Yi ;
Dollar, Piotr ;
Girshick, Ross ;
He, Kaiming ;
Hariharan, Bharath ;
Belongie, Serge .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :936-944
[28]  
Lin WH, 2020, INT CONF ACOUST SPEE, P2588, DOI [10.1109/ICASSP40776.2020.9053829, 10.1109/icassp40776.2020.9053829]
[29]   Path Aggregation Network for Instance Segmentation [J].
Liu, Shu ;
Qi, Lu ;
Qin, Haifang ;
Shi, Jianping ;
Jia, Jiaya .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8759-8768
[30]   SSD: Single Shot MultiBox Detector [J].
Liu, Wei ;
Anguelov, Dragomir ;
Erhan, Dumitru ;
Szegedy, Christian ;
Reed, Scott ;
Fu, Cheng-Yang ;
Berg, Alexander C. .
COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :21-37