Image-based taxonomic classification of bulk insect biodiversity samples using deep learning and domain adaptation

被引:6
作者
Fujisawa, Tomochika [1 ,6 ]
Noguerales, Victor [2 ,3 ,7 ]
Meramveliotakis, Emmanouil [2 ]
Papadopoulou, Anna [2 ]
Vogler, Alfried P. [4 ,5 ]
机构
[1] Shiga Univ, Ctr Data Sci Educ & Res, Hikone, Japan
[2] Univ Cyprus, Dept Biol Sci, Nicosia, Cyprus
[3] Inst Prod Nat & Agrobiol IPNA CSIC, Tenerife, Spain
[4] Nat Hist Museum, Dept Life Sci, London, England
[5] Imperial Coll London, Dept Life Sci, Silwood Pk Campus, Ascot, England
[6] Shiga Univ, Ctr Data Sci Educ & Res, 1-1-1 Banba, Hikone, Shiga 5228522, Japan
[7] Inst Prod Nat & Agrobiol IPNA CSIC, Astrofis Francisco Sanchez 3, Tenerife 38206, Spain
关键词
biodiversity assessment; bulk sample; coleoptera; convolutional neural network; domain adaptation; image classification; machine learning; DIVERSITY; SHOW;
D O I
10.1111/syen.12583
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Complex bulk samples of insects from biodiversity surveys present a challenge for taxonomic identification, which could be overcome by high-throughput imaging combined with machine learning for rapid classification of specimens. These procedures require that taxonomic labels from an existing source data set are used for model training and prediction of an unknown target sample. However, such transfer learning may be problematic for the study of new samples not previously encountered in an image set, for example, from unexplored ecosystems, and require methods of domain adaptation that reduce the differences in the feature distribution of the source and target domains (training and test sets). We assessed the efficiency of domain adaptation for family-level classification of bulk samples of Coleoptera, as a critical first step in the characterization of biodiversity samples. Neural network models trained with images from a global database of Coleoptera were applied to a biodiversity sample from understudied forests in Cyprus as the target. Within-dataset classification accuracy reached 98% and depended on the number and quality of training images, and on dataset complexity. The accuracy of between-datasets predictions (across disparate source-target pairs that do not share any species or genera) was at most 82% and depended greatly on the standardization of the imaging procedure. An algorithm for domain adaptation, domain adversarial training of neural networks (DANN), significantly improved the prediction performance of models trained by non-standardized, low-quality images. Our findings demonstrate that existing databases can be used to train models and successfully classify images from unexplored biota, but the imaging conditions and classification algorithms need careful consideration.
引用
收藏
页码:387 / 401
页数:15
相关论文
共 43 条
  • [21] Image-Based Automated Species Identification: Can Virtual Data Augmentation Overcome Problems of Insufficient Sampling?
    Klasen, Morris
    Ahrens, Dirk
    Eberle, Jonas
    Steinhage, Volker
    Bond, Jason
    [J]. SYSTEMATIC BIOLOGY, 2022, 71 (02) : 320 - 333
  • [22] Pretrained Convolutional Neural Networks Perform Well in a Challenging Test Case: Identification of Plant Bugs (Hemiptera: Miridae) Using a Small Number of Training Images
    Knyshov, Alexander
    Hoang, Samantha
    Weirauch, Christiane
    [J]. INSECT SYSTEMATICS AND DIVERSITY, 2021, 5 (02)
  • [23] A Review of Domain Adaptation without Target Labels
    Kouw, Wouter M.
    Loog, Marco
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (03) : 766 - 785
  • [24] Deep learning
    LeCun, Yann
    Bengio, Yoshua
    Hinton, Geoffrey
    [J]. NATURE, 2015, 521 (7553) : 436 - 444
  • [25] Mukhoti Jishnu, 2020, Advances in Neural Information Processing Systems (NeurIPS), V33, P15288
  • [26] Community metabarcoding reveals the relative role of environmental filtering and spatial processes in metacommunity dynamics of soil microarthropods across a mosaic of montane forests
    Noguerales, Victor
    Meramveliotakis, Emmanouil
    Castro-Insua, Adrian
    Andujar, Carmelo
    Arribas, Paula
    Creedy, Thomas J.
    Overcast, Isaac
    Morlon, Helene
    Emerson, Brent C.
    Vogler, Alfried P.
    Papadopoulou, Anna
    [J]. MOLECULAR ECOLOGY, 2023, 32 (23) : 6110 - 6128
  • [27] Low beta diversity of herbivorous insects in tropical forests
    Novotny, Vojtech
    Miller, Scott E.
    Hulcr, Jiri
    Drew, Richard A. I.
    Basset, Yves
    Janda, Milan
    Setliff, Gregory P.
    Darrow, Karolyn
    Stewart, Alan J. A.
    Auga, John
    Isua, Brus
    Molem, Kenneth
    Manumbor, Markus
    Tamtiai, Elvis
    Mogia, Martin
    Weiblen, George D.
    [J]. NATURE, 2007, 448 (7154) : 692 - U8
  • [28] A Survey on Transfer Learning
    Pan, Sinno Jialin
    Yang, Qiang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (10) : 1345 - 1359
  • [29] Machine learning for expert-level image-based identification of very similar species in the hyperdiverse plant bug family Miridae (Hemiptera: Heteroptera)
    Popkov, Alexander
    Konstantinov, Fedor
    Neimorovets, Vladimir
    Solodovnikov, Alexey
    [J]. SYSTEMATIC ENTOMOLOGY, 2022, 47 (03) : 487 - 503
  • [30] Raats M. M., 1992, Food Quality and Preference, V3, P89, DOI 10.1016/0950-3293(91)90028-D