Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models

被引:0
作者
Cedric Chauve
Yann Ponty
Michael Wallner
机构
[1] Simon Fraser University,Department of Mathematics
[2] Université de Bordeaux,LaBRI
[3] Ecole Polytechnique,LIX
[4] Ecole Polytechnique,CNRS and LIX
[5] TU Wien,Institut für Diskrete Mathematik und Geometrie
来源
Journal of Mathematical Biology | 2020年 / 80卷
关键词
Phylogenetics; Enumerative combinatorics; Asymptotics; Sampling algorithms; 92B99; 05A15; 05A16;
D O I
暂无
中图分类号
学科分类号
摘要
Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including—but not limited to—speciation (S\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {S}}$$\end{document}), gene duplication (D\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {D}}$$\end{document}), gene loss (L\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {L}}$$\end{document}), and horizontal gene transfer (T\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {T}}$$\end{document}). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the DLT\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {D}}{\mathbb {L}}{\mathbb {T}}$$\end{document}-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the DLT\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {D}}{\mathbb {L}}{\mathbb {T}}$$\end{document}-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the DLT\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {D}}{\mathbb {L}}{\mathbb {T}}$$\end{document}-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.
引用
收藏
页码:1353 / 1388
页数:35
相关论文
共 142 条
[1]  
Åkerborg Ö(2009)Simultaneous Bayesian gene tree reconstruction and reconciliation analysis Proc Natl Acad Sci 106 5714-5719
[2]  
Sennblad B(2009)The gene evolution model and computing its associated probabilities J ACM 56 7:1-7:44
[3]  
Arvestad L(2017)Inferring incomplete lineage sorting, duplications, transfers and losses with reconciliations J Theor Biol 432 1-13
[4]  
Lagergren J(2013)Reconciliation revisited: handling multiple optima when reconciling with duplication, transfer, and loss J Comput Biol 20 738-754
[5]  
Arvestad L(2018)Ranger-dtl 2.0: rigorous reconstruction of gene-family evolution by duplication, transfer and loss Bioinformatics 34 3214-3216
[6]  
Lagergren J(2018)On the number of unary–binary tree-like structures with restrictions on the unary height Ann Comb 22 45-91
[7]  
Sennblad B(2009)Isomorphism and symmetries in random phylogenetic trees J Appl Probab 46 1005-1019
[8]  
ban Chan Y(2006)Discordance of species trees with their most likely gene trees PLOS Genet 2 1-7
[9]  
Ranwez V(2009)Gene tree discordance, phylogenetic and the multispecies coalescent Trends Ecol Evolut 24 332-340
[10]  
Scornavacca C(2005)Gene tree distribution under the coalescent process Evolution 59 24-37