Harnessing machine learning to guide phylogenetic-tree search algorithms

被引:31
作者
Azouri, Dana [1 ,2 ]
Abadi, Shiran [1 ]
Mansour, Yishay [3 ]
Mayrose, Itay [1 ]
Pupko, Tal [2 ]
机构
[1] Tel Aviv Univ, Sch Plant Sci & Food Secur, Tel Aviv, Israel
[2] Tel Aviv Univ, Shmunis Sch Biomed & Canc Res, Tel Aviv, Israel
[3] Tel Aviv Univ, Balvatnik Sch Comp Sci, Tel Aviv, Israel
基金
以色列科学基金会;
关键词
TERTIARY STRUCTURE; PROTEIN; MODELS; RECONSTRUCTION; PERFORMANCE; DATABASE; VERSION;
D O I
10.1038/s41467-021-22073-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems involving more than a handful of sequences, inference under the maximum-likelihood paradigm integrates heuristic approaches to evaluate only a subset of all potential trees. Consequently, existing methods suffer from the known tradeoff between accuracy and running time. In this proof-of-concept study, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus potentially accelerating heuristic tree searches without losing accuracy. Our analyses suggest that machine learning can guide tree-search methodologies towards the most promising candidate trees. Likelihood optimization in phylogenetic tree reconstruction is computationally intensive, especially as the number of sequences and taxa included increase. Here, Azouri et al. show how an artificial intelligence approach can reduce computational time without losing accuracy of tree inference.
引用
收藏
页数:9
相关论文
共 51 条
[1]   Model selection may not be a mandatory step for phylogeny reconstruction [J].
Abadi, Shiran ;
Azouri, Dana ;
Pupko, Tal ;
Mayrose, Itay .
NATURE COMMUNICATIONS, 2019, 10 (1)
[2]  
Allen B.L., 2001, ANN COMB, V5, P1, DOI [DOI 10.1007/S00026-001-8006-8, 10.1007/s00026-001-8006-8]
[3]   FastML: a web server for probabilistic reconstruction of ancestral sequences [J].
Ashkenazy, Haim ;
Penn, Osnat ;
Doron-Faigenboim, Adi ;
Cohen, Ofir ;
Cannarozzi, Gina ;
Zomer, Oren ;
Pupko, Tal .
NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) :W580-W584
[4]  
Azouri D., 2021, HARNESSING MACHINE L, DOI [10.1038/s41467-021-22073-8, DOI 10.1038/S41467-021-22073-8]
[5]   DNA reference alignment benchmarks based on tertiary structure of encoded proteins [J].
Carroll, Hyrum ;
Beckstead, Wesley ;
O'Connor, Timothy ;
Ebbert, Mark ;
Clement, Mark ;
Snell, Quinn ;
McClellan, David .
BIOINFORMATICS, 2007, 23 (19) :2648-2649
[6]   Quantifying the impact of protein tertiary structure on molecular evolution [J].
Choi, Sang Chul ;
Hobolth, Asger ;
Robinson, Douglas M. ;
Kishino, Hirohisa ;
Thorne, Jeffrey L. .
MOLECULAR BIOLOGY AND EVOLUTION, 2007, 24 (08) :1769-1782
[7]   Maximum likelihood of evolutionary trees: hardness and approximation [J].
Chor, B ;
Tuller, T .
BIOINFORMATICS, 2005, 21 :I97-I106
[8]   Linking Branch Lengths across Sets of Loci Provides the Highest Statistical Support for Phylogenetic Inference [J].
Duchene, David A. ;
Tong, K. Jun ;
Foster, Charles S. P. ;
Duchene, Sebastian ;
Lanfear, Robert ;
Ho, Simon Y. W. .
MOLECULAR BIOLOGY AND EVOLUTION, 2020, 37 (04) :1202-1210
[9]  
EDWARDS AWF, 1995, SCIENCE, V267, P253, DOI 10.1126/science.7809632
[10]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376