Inferring Taxonomic Affinities and Genetic Distances Using Morphological Features Extracted from Specimen Images: A Case Study with a Bivalve Data Set

被引:0
|
作者
Hofmann, Martin [1 ]
Kiel, Steffen [2 ]
Koesters, Lara M. [3 ]
Waeldchen, Jana [3 ,4 ]
Maeder, Patrick [1 ,4 ,5 ]
机构
[1] Tech Univ Ilmenau, D-98693 Ilmenau, Germany
[2] Swedish Museum Nat Hist, Dept Palaeobiol, S-10405 Stockholm, Sweden
[3] Max Planck Inst Biogeochem, Dept Biogeochem Integrat, D-07745 Jena, Germany
[4] German Ctr Integrat Biodivers Res iDiv, Leipzig, Germany
[5] Friedrich Schiller Univ, Fac Biol Sci, D-07745 Jena, Germany
关键词
Bivalves; deep learning; morphology inference; phylogenetics; similarity learning; CRETACEOUS MASS EXTINCTION; DEEP; CLASSIFICATION; SYSTEM; LIFE; TREE;
D O I
10.1093/sysbio/syae042
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Reconstructing the tree of life and understanding the relationships of taxa are core questions in evolutionary and systematic biology. The main advances in this field in the last decades were derived from molecular phylogenetics; however, for most species, molecular data are not available. Here, we explore the applicability of 2 deep learning methods-supervised classification approaches and unsupervised similarity learning-to infer organism relationships from specimen images. As a basis, we assembled an image data set covering 4144 bivalve species belonging to 74 families across all orders and subclasses of the extant Bivalvia, with molecular phylogenetic data being available for all families and a complete taxonomic hierarchy for all species. The suitability of this data set for deep learning experiments was evidenced by an ablation study resulting in almost 80% accuracy for identifications on the species level. Three sets of experiments were performed using our data set. First, we included taxonomic hierarchy and genetic distances in a supervised learning approach to obtain predictions on several taxonomic levels simultaneously. Here, we stimulated the model to consider features shared between closely related taxa to be more critical for their classification than features shared with distantly related taxa, imprinting phylogenetic and taxonomic affinities into the architecture and training procedure. Second, we used transfer learning and similarity learning approaches for zero-shot experiments to identify the higher-level taxonomic affinities of test species that the models had not been trained on. The models assigned the unknown species to their respective genera with approximately 48% and 67% accuracy. Lastly, we used unsupervised similarity learning to infer the relatedness of the images without prior knowledge of their taxonomic or phylogenetic affinities. The results clearly showed similarities between visual appearance and genetic relationships at the higher taxonomic levels. The correlation was 0.6 for the most species-rich subclass (Imparidentia), ranging from 0.5 to 0.7 for the orders with the most images. Overall, the correlation between visual similarity and genetic distances at the family level was 0.78. However, fine-grained reconstructions based on these observed correlations, such as sister-taxa relationships, require further work. Overall, our results broaden the applicability of automated taxon identification systems and provide a new avenue for estimating phylogenetic relationships from specimen images.
引用
收藏
页码:920 / 940
页数:21
相关论文
共 27 条
  • [1] Case-based Diagnostic Classification Repeatability using Radiomic Features Extracted from Full-Field Digital Mammography Images of Breast Lesions
    Amstutz, Paul
    Drukker, Karen
    Li, Hui
    Abe, Hiroyuki
    Giger, Maryellen L.
    Whitney, Heather M.
    MEDICAL IMAGING 2021: COMPUTER-AIDED DIAGNOSIS, 2021, 11597
  • [2] Analyzing and visualizing morphological features using machine learning techniques and non-big data: A case study of macaque mandibles
    Morita, Takashi
    Ito, Tsuyoshi
    Koda, Hiroki
    Wakamori, Hikaru
    Nishimura, Takeshi
    AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY, 2022, 178 (01): : 44 - 53
  • [3] Permeability Prediction of Gas Diffusion Layers for PEMFC Using Three-Dimensional Convolutional Neural Networks and Morphological Features Extracted from X-ray Tomography Images
    You, Hangil
    Yun, Gun Jin
    COMPOSITES RESEARCH, 2024, 37 (01): : 40 - 45
  • [4] METHODOLOGY FOR THE DIFFERENTIAL-DIAGNOSIS OF A COMPLEX DATA SET - A CASE-STUDY USING DATA FROM ROUTINE CT SCAN EXAMINATIONS
    WIJESINHA, A
    BEGG, CB
    FUNKENSTEIN, HH
    MCNEIL, BJ
    MEDICAL DECISION MAKING, 1983, 3 (02) : 133 - 154
  • [5] Resolution of pulmonary multiplanar reconstruction images from 0.5-mm theoretical isotropic data: A fundamental study using an inflated and fixed lung specimen
    Maki, Daisuke
    Takahashi, Masashi
    Ushio, Noritoshi
    Takazakura, Ryutaro
    Nitta, Norihisa
    Murata, Kiyoshi
    Kanazawa, Susumu
    ACTA MEDICA OKAYAMA, 2007, 61 (02) : 63 - 69
  • [6] Inferring origin-destination trip matrices from aggregate volumes on groups of links: a case study using volumes inferred from mobile phone data
    Caceres, Noelia
    Romero, Luis M.
    Benitez, Francisco G.
    JOURNAL OF ADVANCED TRANSPORTATION, 2013, 47 (07) : 650 - 666
  • [7] Inferring origin-destination trip matrices from aggregate volumes on groups of links: A case study using volumes inferred from mobile phone data
    Caceres, N. (noeliacs@esi.us.es), 1600, John Wiley and Sons Ltd, 410 Park Avenue, 15th Floor, 287 pmb, New York, NY 10022, United States (47):
  • [8] A new classification model for a class imbalanced data set using genetic programming and support vector machines: case study for wilt disease classification
    Pozi, Muhammad Syafiq Mohd
    Sulaiman, Md Nasir
    Mustapha, Norwati
    Perumal, Thinagaran
    REMOTE SENSING LETTERS, 2015, 6 (07) : 568 - 577
  • [9] Automated Fault Classification of Reciprocating Compressors from Vibration Data: A Case Study on Optimization using Genetic Algorithm
    Lin, Yih-Hwang
    Lee, Wen-Sheng
    Wu, Chung-Yung
    37TH NATIONAL CONFERENCE ON THEORETICAL AND APPLIED MECHANICS (37TH NCTAM 2013) & THE 1ST INTERNATIONAL CONFERENCE ON MECHANICS (1ST ICM), 2014, 79 : 355 - 361
  • [10] Increasing water losses from snow captured in the canopy of boreal forests: A case study using a 30 year data set
    Kozii, Nataliia
    Laudon, Hjalmar
    Ottosson-Lofvenius, Mikaell
    Hasselquist, Niles J.
    HYDROLOGICAL PROCESSES, 2017, 31 (20) : 3558 - 3567