Evolutionary distances for protein-coding sequences: Modeling site-specific residue frequencies

被引:222
作者
Halpern, AL
Bruno, WJ
机构
[1] Univ Calif Los Alamos Natl Lab, Los Alamos, NM USA
[2] Santa Fe Inst, Santa Fe, NM 87501 USA
关键词
site-specific frequencies; evolutionary distances; selection; maximum likelihood; saturation; variable-rate models;
D O I
10.1093/oxfordjournals.molbev.a025995
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Estimation of evolutionary distances from coding sequences must take into account protein-level selection to avoid relative underestimation of longer evolutionary distances. Current modeling of selection via site-to-site rate heterogeneity generally neglects another aspect of selection, namely position-specific amino acid frequencies. These frequencies determine the maximum dissimilarity expected for highly diverged but functionally and structurally conserved sequences, and hence are crucial for estimating long distances. We introduce a codon-level model of coding sequence evolution in which position-specific amino acid frequencies are free parameters. In our implementation, these are estimated from an alignment using methods described previously. We use simulations to demonstrate the importance and feasibility of modeling such behavior; our model produces linear distance estimates over a wide range of distances, while several alternative models underestimate long distances relative to short distances. Site-to-site differences in rates, as well as synonymous/nonsynonymous and first/second/third-codon-position differences, arise as a natural consequence of the site-to-site differences in amino acid frequencies.
引用
收藏
页码:910 / 917
页数:8
相关论文
共 36 条
[1]   Estimation of reversible substitution matrices from multiple pairs of sequences [J].
Arvestad, L ;
Bruno, WJ .
JOURNAL OF MOLECULAR EVOLUTION, 1997, 45 (06) :696-703
[2]   HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION [J].
BALDI, P ;
CHAUVIN, Y ;
HUNKAPILLER, T ;
MCCLURE, MA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) :1059-1063
[3]  
BRUNO W, 1995, UNPUB
[4]   Modeling residue usage in aligned protein sequences via maximum likelihood [J].
Bruno, WJ .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (10) :1368-1374
[5]   ANALYSIS OF GENOMIC SEQUENCES OF 95 PAPILLOMAVIRUS TYPES - UNITING TYPING, PHYLOGENY, AND TAXONOMY [J].
CHAN, SY ;
DELIUS, H ;
HALPERN, AL ;
BERNARD, HU .
JOURNAL OF VIROLOGY, 1995, 69 (05) :3074-3083
[6]  
Eddy S R, 1995, J Comput Biol, V2, P9, DOI 10.1089/cmb.1995.2.9
[7]   Hidden Markov models [J].
Eddy, SR .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) :361-365
[8]  
EIGEN M, 1990, AIDS (London), V4, pS85
[9]   A hidden Markov Model approach to variation among sites in rate of evolution [J].
Felsenstein, J ;
Churchill, GA .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (01) :93-104
[10]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376