Efficient methods for estimating amino acid replacement rates

被引:13
作者
Arvestad, Lars [1 ]
机构
[1] Royal Inst Technol, Albanova Univ Ctr, Stockholm Bioinformat Ctr, KTH, SE-10044 Stockholm, Sweden
关键词
amino acid replacement; protein evolution; general time-reversible model; Markov model; parameter estimation;
D O I
10.1007/s00239-004-0113-9
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Replacement rate matrices describe the process of evolution at one position in a protein and are used in many applications where proteins are studied with an evolutionary perspective. Several general matrices have been suggested and have proved to be good approximations of the real process. However, there are data for which general matrices are inappropriate, for example, special protein families, certain lineages in the tree of life, or particular parts of proteins. Analysis of such data could benefit from adaption of a data-specific rate matrix. This paper suggests two new methods for estimating replacement rate matrices from independent pairwise protein sequence alignments and also carefully studies Muller-Vingron's resolvent method. Comprehensive tests on synthetic datasets show that both new methods perform better than the resolvent method in a variety of settings. The best method is furthermore demonstrated to be robust on small datasets as well as practical on very large datasets of real data. Neither short nor divergent sequence pairs have to be discarded, making the method economical with data. A generalization to multialignment data is suggested and used in a test on protein-domain family phylogenies, where it is shown that the method offers family-specific rate matrices that often have a significantly better likelihood than a general matrix.
引用
收藏
页码:663 / 673
页数:11
相关论文
共 40 条
  • [1] Adachi J, 1996, J MOL EVOL, V42, P459
  • [2] A Bayesian evolutionary distance for parametrically aligned sequences
    Agarwal, P
    States, DJ
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1996, 3 (01) : 1 - 17
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] [Anonymous], 1978, Atlas of protein sequence and structure
  • [5] Estimation of reversible substitution matrices from multiple pairs of sequences
    Arvestad, L
    Bruno, WJ
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1997, 45 (06) : 696 - 703
  • [6] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
  • [7] BISHOP M, 1985, P ROY SOC LOND B BIO, V226, P272
  • [8] CAO Y, 1994, J MOL EVOL, V39, P519
  • [9] Rate matrices for analyzing large families of protein sequences
    Devauchelle, C
    Grossmann, A
    Hénaut, A
    Holschneider, M
    Monnerot, M
    Risler, JL
    Torrésani, B
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (04) : 381 - 399
  • [10] Eddy S, 2001, HMMER PROFILE HIDDEN