Image-based taxonomic classification of bulk insect biodiversity samples using deep learning and domain adaptation

被引:6
作者
Fujisawa, Tomochika [1 ,6 ]
Noguerales, Victor [2 ,3 ,7 ]
Meramveliotakis, Emmanouil [2 ]
Papadopoulou, Anna [2 ]
Vogler, Alfried P. [4 ,5 ]
机构
[1] Shiga Univ, Ctr Data Sci Educ & Res, Hikone, Japan
[2] Univ Cyprus, Dept Biol Sci, Nicosia, Cyprus
[3] Inst Prod Nat & Agrobiol IPNA CSIC, Tenerife, Spain
[4] Nat Hist Museum, Dept Life Sci, London, England
[5] Imperial Coll London, Dept Life Sci, Silwood Pk Campus, Ascot, England
[6] Shiga Univ, Ctr Data Sci Educ & Res, 1-1-1 Banba, Hikone, Shiga 5228522, Japan
[7] Inst Prod Nat & Agrobiol IPNA CSIC, Astrofis Francisco Sanchez 3, Tenerife 38206, Spain
关键词
biodiversity assessment; bulk sample; coleoptera; convolutional neural network; domain adaptation; image classification; machine learning; DIVERSITY; SHOW;
D O I
10.1111/syen.12583
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Complex bulk samples of insects from biodiversity surveys present a challenge for taxonomic identification, which could be overcome by high-throughput imaging combined with machine learning for rapid classification of specimens. These procedures require that taxonomic labels from an existing source data set are used for model training and prediction of an unknown target sample. However, such transfer learning may be problematic for the study of new samples not previously encountered in an image set, for example, from unexplored ecosystems, and require methods of domain adaptation that reduce the differences in the feature distribution of the source and target domains (training and test sets). We assessed the efficiency of domain adaptation for family-level classification of bulk samples of Coleoptera, as a critical first step in the characterization of biodiversity samples. Neural network models trained with images from a global database of Coleoptera were applied to a biodiversity sample from understudied forests in Cyprus as the target. Within-dataset classification accuracy reached 98% and depended on the number and quality of training images, and on dataset complexity. The accuracy of between-datasets predictions (across disparate source-target pairs that do not share any species or genera) was at most 82% and depended greatly on the standardization of the imaging procedure. An algorithm for domain adaptation, domain adversarial training of neural networks (DANN), significantly improved the prediction performance of models trained by non-standardized, low-quality images. Our findings demonstrate that existing databases can be used to train models and successfully classify images from unexplored biota, but the imaging conditions and classification algorithms need careful consideration.
引用
收藏
页码:387 / 401
页数:15
相关论文
共 43 条
  • [1] Automatic image-based identification and biomass estimation of invertebrates
    Arje, Johanna
    Melvad, Claus
    Jeppesen, Mads Rosenhoj
    Madsen, Sigurd Agerskov
    Raitoharju, Jenni
    Rasmussen, Maria Strandgard
    Iosifidis, Alexandros
    Tirronen, Ville
    Gabbouj, Moncef
    Meissner, Kristian
    Hoye, Toke Thomas
    [J]. METHODS IN ECOLOGY AND EVOLUTION, 2020, 11 (08): : 922 - 931
  • [2] Metabarcoding and mitochondrial metagenomics of endogean arthropods to unveil the mesofauna of the soil
    Arribas, Paula
    Andujar, Carmelo
    Hopkins, Kevin
    Shepherd, Matthew
    Vogler, Alfried P.
    [J]. METHODS IN ECOLOGY AND EVOLUTION, 2016, 7 (09): : 1071 - 1081
  • [3] Arthropod Diversity in a Tropical Forest
    Basset, Yves
    Cizek, Lukas
    Cuenoud, Philippe
    Didham, Raphael K.
    Guilhaumon, Francois
    Missa, Olivier
    Novotny, Vojtech
    Odegaard, Frode
    Roslin, Tomas
    Schmidl, Juergen
    Tishechkin, Alexey K.
    Winchester, Neville N.
    Roubik, David W.
    Aberlenc, Henri-Pierre
    Bail, Johannes
    Barrios, Hector
    Bridle, Jon R.
    Castano-Meneses, Gabriela
    Corbara, Bruno
    Curletti, Gianfranco
    da Rocha, Wesley Duarte
    de Bakker, Domir
    Delabie, Jacques H. C.
    Dejean, Alain
    Fagan, Laura L.
    Floren, Andreas
    Kitching, Roger L.
    Medianero, Enrique
    Miller, Scott E.
    de Oliveira, Evandro Gama
    Orivel, Jerome
    Pollet, Marc
    Rapp, Mathieu
    Ribeiro, Servio P.
    Roisin, Yves
    Schmidt, Jesper B.
    Sorensen, Line
    Leponce, Maurice
    [J]. SCIENCE, 2012, 338 (6113) : 1481 - 1484
  • [4] The SITE-100 Project: Site-Based Biodiversity Genomics for Species Discovery, Community Ecology, and a Global Tree-of-Life
    Bian, Xueni
    Garner, Beulah H.
    Liu, Huaxi
    Vogler, Alfried P.
    [J]. FRONTIERS IN ECOLOGY AND EVOLUTION, 2022, 10
  • [5] Image-based species identification of wild bees using convolutional neural networks
    Buschbacher, Keanu
    Ahrens, Dirk
    Espeland, Marianne
    Steinhage, Volker
    [J]. ECOLOGICAL INFORMATICS, 2020, 55
  • [6] Oribatid mites show how climate and latitudinal gradients in organic matter can drive large-scale biodiversity patterns of soil communities
    Caruso, Tancredi
    Schaefer, Ina
    Monson, Frank
    Keith, Aidan M.
    [J]. JOURNAL OF BIOGEOGRAPHY, 2019, 46 (03) : 611 - 620
  • [7] Can We Name Earth's Species Before They Go Extinct?
    Costello, Mark J.
    May, Robert M.
    Stork, Nigel E.
    [J]. SCIENCE, 2013, 339 (6118) : 413 - 416
  • [8] Donahue J, 2014, PR MACH LEARN RES, V32
  • [9] Data Descriptor: Freshwater macroinvertebrate samples from a water quality monitoring network in the Iberian Peninsula
    Escribano, Nora
    Oscoz, Javier
    Galicia, David
    Cancellario, Tommaso
    Duran, Concha
    Navarro, Patricia
    Arino, Arturo H.
    [J]. SCIENTIFIC DATA, 2018, 5
  • [10] Farahani A, 2020, Arxiv, DOI arXiv:2010.03978