Ancestral Population Genomics: The Coalescent Hidden Markov Model Approach

被引:74
作者
Dutheil, Julien Y. [1 ]
Ganapathy, Ganesh [3 ]
Hobolth, Asger [1 ]
Mailund, Thomas [1 ]
Uyenoyama, Marcy K. [4 ]
Schierup, Mikkel H. [1 ,2 ]
机构
[1] Aarhus Univ, Bioinformat Res Ctr, DK-8000 Aarhus C, Denmark
[2] Aarhus Univ, Dept Biol Sci, DK-8000 Aarhus C, Denmark
[3] Natl Evolutionary Synth Ctr, Durham, NC 27705 USA
[4] Duke Univ, Dept Biol, Durham, NC 27708 USA
关键词
DNA-SEQUENCES; EVOLUTIONARY TREES; RECOMBINATION; RATES; PREDICTION; LIBRARIES; HUMANS; SIZES; BIO++; MAP;
D O I
10.1534/genetics.109.103010
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
With incomplete lineage sorting (ILS), the genealogy of closely related species differs along their genomes. The amount of ILS depends on population parameters such as the ancestral effective population sizes and the recombination rate, but also on the number of generations between speciation events. We rise a hidden Markov model parameterized according to coalescent theory to infer the genealogy along a four species genome alignment of closely related species and estimate population parameters. We analyze a basic, panmictic demographic model and study its properties using an extensive set of coalescent simulations. We assess the effect of the model assumptions and demonstrate that the Markov property provides a good approximation to the ancestral recombination graph. Using a too restricted set of possible genealogies, necessary to reduce the computational load, can bias parameter estimates. We propose a simple correction for this bias and suggest directions for future extensions of the model. We show that the patterns of ILS along a sequence alignment can be recovered efficiently together with the ancestral recombination rate. Finally, we introduce an extension of the basic model that. allows for mutation rate heterogeneity and reanalyze human-chimpanzee-gorilla-orangutan alignments, using the new models. We expect that this framework will prove useful for population genomics and provide exciting insights into genome evolution.
引用
收藏
页码:259 / 274
页数:16
相关论文
共 29 条
[1]  
[Anonymous], 2005, Gene genealogies, variation and evolution
[2]   Estimation of hominoid ancestral population sizes under Bayesian coalescent models incorporating mutation rate variation and sequencing errors [J].
Burgess, Ralph ;
Yang, Ziheng .
MOLECULAR BIOLOGY AND EVOLUTION, 2008, 25 (09) :1979-1994
[3]   Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees [J].
Chen, FC ;
Li, WH .
AMERICAN JOURNAL OF HUMAN GENETICS, 2001, 68 (02) :444-456
[4]  
Durbin R., 1998, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
[5]   Bio++:: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics [J].
Dutheil, Julien ;
Gaillard, Sylvain ;
Bazin, Eric ;
Glemin, Sylvain ;
Ranwez, Vincent ;
Galtier, Nicolas ;
Belkhir, Khalid .
BMC BIOINFORMATICS, 2006, 7 (1)
[6]   Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs [J].
Dutheil, Julien ;
Boussau, Bastien .
BMC EVOLUTIONARY BIOLOGY, 2008, 8 (1)
[7]  
Efron B., 1994, An introduction to the bootstrap
[8]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[9]   Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses [J].
Goldman, N ;
Thorne, JL ;
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 263 (02) :196-208
[10]  
Griffiths R.C., 1991, Lect Notes-Monogr Ser, V18, P100