On reduced amino acid alphabets for phylogenetic inference

被引:121
作者
Susko, Edward [1 ]
Roger, Andrew J.
机构
[1] Dalhousie Univ, Dept Math & Stat, Halifax, NS B3H 3J5, Canada
[2] Dalhousie Univ, Dept Biochem & Mol Biol, Halifax, NS B3H 3J5, Canada
关键词
D O I
10.1093/molbev/msm144
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We investigate the use of Markov models of evolution for reduced amino acid alphabets or bins of amino acids. The use of reduced amino acid alphabets can ameliorate effects of model misspecification and saturation. We present algorithms for 2 different ways of automating the construction of bins: minimizing criteria based on properties of rate matrices and minimizing criteria based on properties of alignments. By simulation, we show that in the absence of model misspecification, the loss of information due to binning is found to be insubstantial, and the use of Markov models at the binned level is found to be almost as effective as the more appropriate missing data approach. By applying these approaches to real data sets where compositional heterogeneity and/or saturation appear to be causing biased tree estimation, we find that binning can improve topological estimation in practice.
引用
收藏
页码:2139 / 2150
页数:12
相关论文
共 38 条
  • [1] Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences
    Ababneh, F
    Jermiin, LS
    Ma, CS
    Robinson, J
    [J]. BIOINFORMATICS, 2006, 22 (10) : 1225 - 1231
  • [2] Abramowitz M., 1972, HDB MATH FUNCTIONS F
  • [3] Evidence for a clade of nematodes, arthropods and other moulting animals
    Aguinaldo, AMA
    Turbeville, JM
    Linford, LS
    Rivera, MC
    Garey, JR
    Raff, RA
    Lake, JA
    [J]. NATURE, 1997, 387 (6632) : 489 - 493
  • [4] Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices
    Cannata, N
    Toppo, S
    Romualdi, C
    Valle, G
    [J]. BIOINFORMATICS, 2002, 18 (08) : 1102 - 1108
  • [5] Dayhoff M.O., 1978, ATLAS PROTEIN SEQ ST, V5
  • [6] Genome-scale evidence of the nematode-arthropod clade
    Dopazo, H
    Dopazo, J
    [J]. GENOME BIOLOGY, 2005, 6 (05)
  • [7] Mitochondria and hydrogenosomes are two forms of the same fundamental organelle
    Embley, TM
    van der Giezen, M
    Horner, DS
    Dyal, PL
    Foster, P
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2003, 358 (1429) : 191 - 202
  • [8] Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions
    Foster, PG
    Hickey, DA
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1999, 48 (03) : 284 - 290
  • [9] The chloroplast genome of Nymphaea alba:: Whole-genome analyses and the problem of identifying the most basal angiosperm
    Goremykin, VV
    Hirsch-Ernst, KI
    Wölfl, S
    Hellwig, FH
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (07) : 1445 - 1454
  • [10] Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm
    Goremykin, VV
    Hirsch-Ernst, KI
    Wölfl, S
    Hellwig, FH
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2003, 20 (09) : 1499 - 1505