Validation of machine learning approach for direct mutation rate estimation

被引:3
|
作者
Burda, Katarzyna [1 ]
Konczal, Mateusz [1 ,2 ]
机构
[1] Adam Mickiewicz Univ, Fac Biol, Evolutionary Biol Grp, Poznan, Poland
[2] Adam Mickiewicz Univ, Fac Biol, Evolutionary Biol Grp, PL-60614 Poznan, Poland
关键词
guppy; machine learning; mutation rate; teleost; whole-genome sequencing; DE-NOVO MUTATIONS; GERMLINE MUTATION; POPULATION HISTORY; METABOLIC-RATE; EVOLUTION; SELECTION; DYNAMICS; GENETICS; INSIGHTS; FORMAT;
D O I
10.1111/1755-0998.13841
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Mutations are the primary source of all genetic variation. Knowledge about their rates is critical for any evolutionary genetic analyses, but for a long time, that knowledge has remained elusive and indirectly inferred. In recent years, parent-offspring comparisons have yielded the first direct mutation rate estimates. The analyses are, however, challenging due to high rate of false positives and no consensus regarding standardized filtering of candidate de novo mutations. Here, we validate the application of a machine learning approach for such a task and estimate the mutation rate for the guppy (Poecilia reticulata), a model species in eco-evolutionary studies. We sequenced 4 parents and 20 offspring, followed by screening their genomes for de novo mutations. The initial large number of candidate de novo mutations was hard-filtered to remove false-positive results. These results were compared with mutation rate estimated with a supervised machine learning approach. Both approaches were followed by molecular validation of all candidate de novo mutations and yielded similar results. The ML method uniquely identified three mutations, but overall required more hands-on curation and had higher rates of false positives and false negatives. Both methods concordantly showed no difference in mutation rates between families. Estimated here the guppy mutation rate is among the lowest directly estimated mutation rates in vertebrates; however, previous research has also found low estimated rates in other teleost fishes. We discuss potential explanations for such a pattern, as well as future utility and limitations of machine learning approaches.
引用
收藏
页码:1757 / 1771
页数:15
相关论文
共 50 条
  • [1] Estimation of the SNP Mutation Rate in Two Vegetatively Propagating Species of Duckweed
    Sandler, George
    Bartkowska, Magdalena
    Agrawal, Aneil F.
    Wright, Stephen I.
    G3-GENES GENOMES GENETICS, 2020, 10 (11): : 4191 - 4200
  • [2] A novel machine learning approach for rice yield estimation
    Lingwal, Surabhi
    Bhatia, Komal Kumar
    Singh, Manjeet
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2024, 36 (03) : 337 - 356
  • [3] Machine learning approach for GNSS geodetic velocity estimation
    Ozarpaci, Seda
    Kilic, Batuhan
    Bayrak, Onur Can
    Taskiran, Murat
    Dogan, Ugur
    Floyd, Michael
    GPS SOLUTIONS, 2024, 28 (02)
  • [4] Validation of miRNAs as Breast Cancer Biomarkers with a Machine Learning Approach
    Rehman, Oneeb
    Zhuang, Hanqi
    Ali, Ali Muhamed
    Ibrahim, Ali
    Li, Zhongwei
    CANCERS, 2019, 11 (03):
  • [5] A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New Candidates
    Lin, Ying
    Afshar, Shiva
    Rajadhyaksha, Anjali M.
    Potash, James B.
    Han, Shizhong
    FRONTIERS IN GENETICS, 2020, 11
  • [6] De Novo Mutation Rate Estimation in Wolves of Known Pedigree
    Koch, Evan M.
    Schweizer, Rena M.
    Schweizer, Teia M.
    Stahler, Daniel R.
    Smith, Douglas W.
    Wayne, Robert K.
    Novembre, John
    MOLECULAR BIOLOGY AND EVOLUTION, 2019, 36 (11) : 2536 - 2547
  • [7] Forecasting and Analyzing Predictors of Inflation Rate: Using Machine Learning Approach
    Das, Pijush Kanti
    Das, Prabir Kumar
    JOURNAL OF QUANTITATIVE ECONOMICS, 2024, 22 (02) : 493 - 517
  • [8] A machine learning approach for real-time cortical state estimation
    Weiss, David A.
    Borsa, Adriano M. F.
    Pala, Aurelie
    Sederberg, Audrey J.
    Stanley, Garrett B.
    JOURNAL OF NEURAL ENGINEERING, 2024, 21 (01)
  • [9] Upper-limit mutation rate estimation for a plant RNA virus
    Sanjuan, Rafael
    Agudelo-Romero, Patricia
    Elena, Santiago F.
    BIOLOGY LETTERS, 2009, 5 (03) : 394 - 396
  • [10] Mud loss estimation using machine learning approach
    Al-Hameedi, Abo Taleb T.
    Alkinani, Husam H.
    Dunn-Norman, Shari
    Flori, Ralph E.
    Hilgedick, Steven A.
    Amer, Ahmed S.
    Alsaba, Mortadha
    JOURNAL OF PETROLEUM EXPLORATION AND PRODUCTION TECHNOLOGY, 2019, 9 (02) : 1339 - 1354