Inferring complex DNA substitution processes on phylogenies using uniformization and data augmentation

被引:25
作者
Mateiu, L [1 ]
Rannala, B
机构
[1] Univ Alberta, Dept Med Genet, Edmonton, AB T6G 2M7, Canada
[2] Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA
[3] Univ Calif Davis, Sect Evolut & Ecol, Davis, CA 95616 USA
关键词
Bayesian phylogenetic inference; Markov process; Metropolis-Hastings algorithm; molecular evolution; site-specific rates;
D O I
10.1080/10635150500541599
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A new method is developed for calculating sequence substitution probabilities using Markov chain Monte Carlo (MCMC) methods. The basic strategy is to use uniformization to transform the original continuous time Markov process into a Poisson substitution process and a discrete Markov chain of state transitions. An efficient MCMC algorithm for evaluating substitution probabilities by this approach using a continuous gamma distribution to model site-specific rates is outlined. The method is applied to the problem of inferring branch lengths and site-specific rates from nucleotide sequences under a general time-reversible (GTR) model and a computer program BYPASSR is developed. Simulations are used to examine the performance of the new program relative to an existing program BASEML that uses a discrete approximation for the gamma distributed prior on site-specific rates. It is found that BASEML and BYPASSR are in close agreement when inferring branch lengths, regardless of the number of rate categories used, but that BASEML tends to underestimate high site-specific substitution rates, and to overestimate intermediate rates, when fewer than 50 rate categories are used. Rate estimates obtained using BASEML agree more closely with those of BYPASSR as the number of rate categories increases. Analyses of the posterior distributions of site-specific rates from BYPASSR suggest that a large number of taxa are needed to obtain precise estimates of site-specific rates, especially when rates are very high or very low. The method is applied to analyze 45 sequences of the alpha 2B adrenergic receptor gene (A2AB) from a sample of eutherian taxa. In general, the pattern expected for regions under negative selection is observed with third codon positions having the highest inferred rates, followed by first codon positions and with second codon positions having the lowest inferred rates. Several sites show exceptionally high substitution rates at second codon positions that may represent the effects of positive selection.
引用
收藏
页码:259 / 269
页数:11
相关论文
共 26 条
[1]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[2]  
Felsenstein Joseph, 2004, Inferring_phylogenies, V2
[3]   STATISTICAL TESTS OF MODELS OF DNA SUBSTITUTION [J].
GOLDMAN, N .
JOURNAL OF MOLECULAR EVOLUTION, 1993, 36 (02) :182-198
[4]   Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models [J].
Huelsenbeck, JP ;
Rannala, B .
SYSTEMATIC BIOLOGY, 2004, 53 (06) :904-913
[5]   Phylogenetic methods come of age: Testing hypotheses in an evolutionary context [J].
Huelsenbeck, JP ;
Rannala, B .
SCIENCE, 1997, 276 (5310) :227-232
[6]   Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution [J].
Hwang, DG ;
Green, P .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (39) :13994-14001
[7]   Markoff Chains as an Aid in the Study of Markoff Processes [J].
Jensen, Arne .
SKANDINAVISK AKTUARIETIDSKRIFT, 1953, 36 (1-2) :87-91
[8]   Probabilistic models of DNA sequence evolution with context dependent rates of substitution [J].
Jensen, JL ;
Pedersen, AMK .
ADVANCES IN APPLIED PROBABILITY, 2000, 32 (02) :499-517
[9]  
Jukes TH, 1969, MAMMALIAN PROTEIN ME, P21, DOI [DOI 10.1016/B978-1-4832-3211-9.50009-7, DOI 10.1093/BIOINFORMATICS/BTM404]
[10]   The importance of proper model assumption in Bayesian phylogenetics [J].
Lemmon, AR ;
Moriarty, EC .
SYSTEMATIC BIOLOGY, 2004, 53 (02) :265-277