Validation of machine learning approach for direct mutation rate estimation

被引:3
|
作者
Burda, Katarzyna [1 ]
Konczal, Mateusz [1 ,2 ]
机构
[1] Adam Mickiewicz Univ, Fac Biol, Evolutionary Biol Grp, Poznan, Poland
[2] Adam Mickiewicz Univ, Fac Biol, Evolutionary Biol Grp, PL-60614 Poznan, Poland
关键词
guppy; machine learning; mutation rate; teleost; whole-genome sequencing; DE-NOVO MUTATIONS; GERMLINE MUTATION; POPULATION HISTORY; METABOLIC-RATE; EVOLUTION; SELECTION; DYNAMICS; GENETICS; INSIGHTS; FORMAT;
D O I
10.1111/1755-0998.13841
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Mutations are the primary source of all genetic variation. Knowledge about their rates is critical for any evolutionary genetic analyses, but for a long time, that knowledge has remained elusive and indirectly inferred. In recent years, parent-offspring comparisons have yielded the first direct mutation rate estimates. The analyses are, however, challenging due to high rate of false positives and no consensus regarding standardized filtering of candidate de novo mutations. Here, we validate the application of a machine learning approach for such a task and estimate the mutation rate for the guppy (Poecilia reticulata), a model species in eco-evolutionary studies. We sequenced 4 parents and 20 offspring, followed by screening their genomes for de novo mutations. The initial large number of candidate de novo mutations was hard-filtered to remove false-positive results. These results were compared with mutation rate estimated with a supervised machine learning approach. Both approaches were followed by molecular validation of all candidate de novo mutations and yielded similar results. The ML method uniquely identified three mutations, but overall required more hands-on curation and had higher rates of false positives and false negatives. Both methods concordantly showed no difference in mutation rates between families. Estimated here the guppy mutation rate is among the lowest directly estimated mutation rates in vertebrates; however, previous research has also found low estimated rates in other teleost fishes. We discuss potential explanations for such a pattern, as well as future utility and limitations of machine learning approaches.
引用
收藏
页码:1757 / 1771
页数:15
相关论文
共 50 条
  • [1] A machine learning based approach to rate estimation
    Johnson, Matthew
    Green, William
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256
  • [2] A machine learning based approach to reaction rate estimation
    Johnson, Matthew S.
    Green, William H.
    REACTION CHEMISTRY & ENGINEERING, 2024, 9 (06) : 1364 - 1380
  • [3] MACHINE LEARNING APPROACH TO CHIRP RATE ESTIMATION OF LINEAR FREQUENCY MODULATED RADARS
    Young, Anne
    Luong, David
    Balaji, Bhashyam
    Rajan, Sreeraman
    2020 INTEGRATED COMMUNICATIONS NAVIGATION AND SURVEILLANCE CONFERENCE (ICNS), 2020,
  • [4] A machine learning approach for somatic mutation discovery
    Wood, Derrick
    White, James
    Georgiadis, Andrew
    Van Emburgh, Beth
    Parpart-Li, Sonya
    Mitchell, Jason
    Anagnostou, Valsamo
    Niknafs, Noushin
    Karchin, Rachel
    Papp, Eniko
    McCord, Christine
    Loverso, Peter
    Riley, David
    Diaz, Luis A.
    Jones, Sian
    Sausen, Mark
    Velculescu, Victor E.
    Angiuoli, Samuel
    CANCER RESEARCH, 2018, 78 (13)
  • [5] A machine learning approach for somatic mutation discovery
    Wood, Derrick E.
    White, James R.
    Georgiadis, Andrew
    Van Emburgh, Beth
    Parpart-Li, Sonya
    Mitchell, Jason
    Anagnostou, Valsamo
    Niknafs, Noushin
    Karchin, Rachel
    Papp, Eniko
    McCord, Christine
    LoVerso, Peter
    Riley, David
    Diaz, Luis A., Jr.
    Jones, Sian
    Sausen, Mark
    Velculescu, Victor E.
    Angiuoli, Samuel V.
    SCIENCE TRANSLATIONAL MEDICINE, 2018, 10 (457)
  • [6] Machine Learning Approach for Sensors Validation and Clustering
    Nasser, Abdo M. T.
    Pawar, V. P.
    2015 INTERNATIONAL CONFERENCE ON EMERGING RESEARCH IN ELECTRONICS, COMPUTER SCIENCE AND TECHNOLOGY (ICERECT), 2015, : 370 - 375
  • [7] Wearable spasticity estimation and validation using machine learning
    Wang, Shou-Jen
    Park, Jeong-Ho
    Park, Hyung-Soon
    Nanda, Devak
    Albert, Mark, V
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2109 - 2112
  • [8] Direct estimation of the mitochondrial DNA mutation rate in Drosophila melanogaster
    Haag-Liautard, Cathy
    Coffey, Nicole
    Houle, David
    Lynch, Michael
    Charlesworth, Brian
    Keightley, Peter D.
    PLOS BIOLOGY, 2008, 6 (08): : 1706 - 1714
  • [9] A machine learning approach to Bayesian parameter estimation
    Samuel Nolan
    Augusto Smerzi
    Luca Pezzè
    npj Quantum Information, 7
  • [10] Machine learning approach to fetal weight estimation
    Solt, Ido
    Caspi, Or
    Beloosesky, Ron
    Weiner, Zeev
    Avdor, Eyal
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2019, 220 (01) : S666 - S667