One-stage CNN detector-based benthonic organisms detection with limited training dataset

被引:39
作者
Chen, Tingkai [2 ]
Wang, Ning [1 ,3 ]
Wang, Rongfeng [2 ]
Zhao, Hong [2 ]
Zhang, Guichen [4 ]
机构
[1] Dalian Maritime Univ, Sch Marine Engn, Dalian 116026, Peoples R China
[2] Dalian Maritime Univ, Sch Marine Elect Engn, Dalian 116026, Peoples R China
[3] Harbin Engn Univ, Coll Shipbldg Engn, Harbin 150001, Peoples R China
[4] Shanghai Maritime Univ, Merchant Marine Coll, Shanghai 201306, Peoples R China
关键词
Benthonic organisms detection; One-stage CNN detector; Generalized intersection over union; Benthonic organisms anchor boxes; Data augmentation; NETWORK; IMAGE;
D O I
10.1016/j.neunet.2021.08.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, focusing on the challenges in unique shape dimension and limited training dataset of benthonic organisms, an one-stage CNN detector-based benthonic organisms detection (OSCD-BOD) scheme is proposed. Main contributions are as follows: (1) The regression loss between the predicted bounding box and ground truth box is innovatively measured by the generalized intersection over union (GIoU), such that localization accuracy of benthonic organisms is dramatically enhanced. (2) By devising K-means-based dimension clustering, multiple benthonic organisms anchor boxes (BOAB) sufficiently exploring a priori dimension information can be finely derived from limited training dataset, and thereby significantly promoting the recall ability. (3) Geometric and color transformations (GCT)-based data augmentation technique is further resorted to not only efficiently prevent over-fitting training but also to significantly enhance detection generalization in complex and changeable underwater environments. (4) The OSCD-BOD scheme is eventually established in a modular manner by integrating GIoU, BOAB and GCT functionals. Comprehensive experiments and comparisons sufficiently demonstrate that the proposed OSCD-BOD scheme outperforms typical approaches including Faster R-CNN, SSD, YOLOv2, YOLOv3 and CenterNet in terms of mean average precision by 6.88%, 10.92%, 12.44%, 3.05% and 1.09%, respectively. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页码:247 / 259
页数:13
相关论文
共 55 条
  • [1] Ancuti C, 2012, PROC CVPR IEEE, P81, DOI 10.1109/CVPR.2012.6247661
  • [2] [Anonymous], 2017, MOBILENETS EFFICIENT
  • [3] Speaker recognition based on deep learning: An overview
    Bai, Zhongxin
    Zhang, Xiao-Lei
    [J]. NEURAL NETWORKS, 2021, 140 : 65 - 99
  • [4] Speeded-Up Robust Features (SURF)
    Bay, Herbert
    Ess, Andreas
    Tuytelaars, Tinne
    Van Gool, Luc
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (03) : 346 - 359
  • [5] Stereo vision with texture learning for fault-tolerant automatic baling
    Blas, Morten Rufus
    Blanke, Mogens
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2011, 75 (01) : 159 - 168
  • [6] Towards Accurate One-Stage Object Detection with AP-Loss
    Chen, Kean
    Li, Jianguo
    Lin, Weiyao
    See, John
    Wang, Ji
    Duan, Lingyu
    Chen, Zhibo
    He, Changwei
    Zou, Junni
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5114 - 5122
  • [7] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
  • [8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [9] CenterNet: Keypoint Triplets for Object Detection
    Duan, Kaiwen
    Bai, Song
    Xie, Lingxi
    Qi, Honggang
    Huang, Qingming
    Tian, Qi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6568 - 6577
  • [10] Sigmoid-weighted linear units for neural network function approximation in reinforcement learning
    Elfwing, Stefan
    Uchibe, Eiji
    Doya, Kenji
    [J]. NEURAL NETWORKS, 2018, 107 : 3 - 11