Harnessing machine learning to guide phylogenetic-tree search algorithms

被引:31
作者
Azouri, Dana [1 ,2 ]
Abadi, Shiran [1 ]
Mansour, Yishay [3 ]
Mayrose, Itay [1 ]
Pupko, Tal [2 ]
机构
[1] Tel Aviv Univ, Sch Plant Sci & Food Secur, Tel Aviv, Israel
[2] Tel Aviv Univ, Shmunis Sch Biomed & Canc Res, Tel Aviv, Israel
[3] Tel Aviv Univ, Balvatnik Sch Comp Sci, Tel Aviv, Israel
基金
以色列科学基金会;
关键词
TERTIARY STRUCTURE; PROTEIN; MODELS; RECONSTRUCTION; PERFORMANCE; DATABASE; VERSION;
D O I
10.1038/s41467-021-22073-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems involving more than a handful of sequences, inference under the maximum-likelihood paradigm integrates heuristic approaches to evaluate only a subset of all potential trees. Consequently, existing methods suffer from the known tradeoff between accuracy and running time. In this proof-of-concept study, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus potentially accelerating heuristic tree searches without losing accuracy. Our analyses suggest that machine learning can guide tree-search methodologies towards the most promising candidate trees. Likelihood optimization in phylogenetic tree reconstruction is computationally intensive, especially as the number of sequences and taxa included increase. Here, Azouri et al. show how an artificial intelligence approach can reduce computational time without losing accuracy of tree inference.
引用
收藏
页数:9
相关论文
共 51 条
[31]   Selectome update: quality control and computational improvements to a database of positive selection [J].
Moretti, Sebastien ;
Laurenczy, Balazs ;
Gharib, Walid H. ;
Castella, Briseis ;
Kuzniar, Arnold ;
Schabauer, Hannes ;
Studer, Romain A. ;
Valle, Mario ;
Salamin, Nicolas ;
Stockinger, Heinz ;
Robinson-Rechavi, Marc .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D917-D921
[32]  
Nielsen R, 1998, GENETICS, V148, P929
[33]   Multiple sequence alignment accuracy and phylogenetic inference [J].
Ogden, TH ;
Rosenberg, MS .
SYSTEMATIC BIOLOGY, 2006, 55 (02) :314-328
[34]  
Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
[35]  
Piel W.H., 2009, E BIOSPHERE LOND
[36]   OrthoMaM: A database of orthologous genomic markers for placental mammal phylogenetics [J].
Ranwez, Vincent ;
Delsuc, Frederic ;
Ranwez, Sylvie ;
Belkhir, Khalid ;
Tilak, Marie-Ka ;
Douzery, Emmanuel J. P. .
BMC EVOLUTIONARY BIOLOGY, 2007, 7 (1)
[37]  
Robinson D. F., 1971, Journal of Combinatorial Theory, Series B, V11, P105, DOI 10.1016/0095-8956(71)90020-7
[38]   COMPARISON OF PHYLOGENETIC TREES [J].
ROBINSON, DF ;
FOULDS, LR .
MATHEMATICAL BIOSCIENCES, 1981, 53 (1-2) :131-147
[39]   THE NEIGHBOR-JOINING METHOD - A NEW METHOD FOR RECONSTRUCTING PHYLOGENETIC TREES [J].
SAITOU, N ;
NEI, M .
MOLECULAR BIOLOGY AND EVOLUTION, 1987, 4 (04) :406-425
[40]  
Stamatakis A., 2005, P 19 IEEE INT PAR DI