Harnessing machine learning to guide phylogenetic-tree search algorithms

被引:31
作者
Azouri, Dana [1 ,2 ]
Abadi, Shiran [1 ]
Mansour, Yishay [3 ]
Mayrose, Itay [1 ]
Pupko, Tal [2 ]
机构
[1] Tel Aviv Univ, Sch Plant Sci & Food Secur, Tel Aviv, Israel
[2] Tel Aviv Univ, Shmunis Sch Biomed & Canc Res, Tel Aviv, Israel
[3] Tel Aviv Univ, Balvatnik Sch Comp Sci, Tel Aviv, Israel
基金
以色列科学基金会;
关键词
TERTIARY STRUCTURE; PROTEIN; MODELS; RECONSTRUCTION; PERFORMANCE; DATABASE; VERSION;
D O I
10.1038/s41467-021-22073-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems involving more than a handful of sequences, inference under the maximum-likelihood paradigm integrates heuristic approaches to evaluate only a subset of all potential trees. Consequently, existing methods suffer from the known tradeoff between accuracy and running time. In this proof-of-concept study, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus potentially accelerating heuristic tree searches without losing accuracy. Our analyses suggest that machine learning can guide tree-search methodologies towards the most promising candidate trees. Likelihood optimization in phylogenetic tree reconstruction is computationally intensive, especially as the number of sequences and taxa included increase. Here, Azouri et al. show how an artificial intelligence approach can reduce computational time without losing accuracy of tree inference.
引用
收藏
页数:9
相关论文
共 51 条
[11]   Maximum-likelihood phylogenetic analysis under a covarion-like model [J].
Galtier, N .
MOLECULAR BIOLOGY AND EVOLUTION, 2001, 18 (05) :866-873
[12]   BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data [J].
Gascuel, O .
MOLECULAR BIOLOGY AND EVOLUTION, 1997, 14 (07) :685-695
[13]   A phylogenetic mixture model for the identification of functionally divergent protein residues [J].
Gaston, Daniel ;
Susko, Edward ;
Roger, Andrew J. .
BIOINFORMATICS, 2011, 27 (19) :2655-2663
[14]   Polyploidy and sexual system in angiosperms: Is there an association? [J].
Glick, Lior ;
Sabath, Niv ;
Ashman, Tia-Lynn ;
Goldberg, Emma ;
Mayrose, Itay .
AMERICAN JOURNAL OF BOTANY, 2016, 103 (07) :1223-1235
[15]   New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0 [J].
Guindon, Stephane ;
Dufayard, Jean-Francois ;
Lefort, Vincent ;
Anisimova, Maria ;
Hordijk, Wim ;
Gascuel, Olivier .
SYSTEMATIC BIOLOGY, 2010, 59 (03) :307-321
[16]   MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics [J].
Helaers, RaphaeL ;
Milinkovitch, Michel C. .
BMC BIOINFORMATICS, 2010, 11
[17]   Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood [J].
Hordijk, W ;
Gascuel, O .
BIOINFORMATICS, 2005, 21 (24) :4338-4347
[18]   PERFORMANCE OF PHYLOGENETIC METHODS IN SIMULATION [J].
HUELSENBECK, JP .
SYSTEMATIC BIOLOGY, 1995, 44 (01) :17-48
[19]  
James G, 2013, SPRINGER TEXTS STAT, V103, P1, DOI 10.1007/978-1-4614-7138-7_1
[20]  
JUKES T H, 1969, P21