Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood

被引:371
作者
Berger, Simon A. [1 ]
Krompass, Denis [1 ]
Stamatakis, Alexandros [1 ]
机构
[1] Heidelberg Inst Theoret Studies, Exelixis Lab, Sci Comp Grp, D-69118 Heidelberg, Germany
关键词
Maximum likelihood; metagenomics; phylogenetic placement; RAxML; short sequence reads; COMPOSITIONAL HETEROGENEITY; PHYLOGENETIC CLASSIFICATION; DNA-SEQUENCES; CONFIDENCE; COMMUNITIES; DIVERSITY; ALIGNMENT; DATABASE; BLAST; MODEL;
D O I
10.1093/sysbio/syr010
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We present an evolutionary placement algorithm (EPA) and a Web server for the rapid assignment of sequence fragments (short reads) to edges of a given phylogenetic tree under the maximum-likelihood model. The accuracy of the algorithm is evaluated on several real-world data sets and compared with placement by pair-wise sequence comparison, using edit distances and BLAST. We introduce a slow and accurate as well as a fast and less accurate placement algorithm. For the slow algorithm, we develop additional heuristic techniques that yield almost the same run times as the fast version with only a small loss of accuracy. When those additional heuristics are employed, the run time of the more accurate algorithm is comparable with that of a simple BLAST search for data sets with a high number of short query sequences. Moreover, the accuracy of the EPA is significantly higher, in particular when the sample of taxa in the reference topology is sparse or inadequate. Our algorithm, which has been integrated into RAxML, therefore provides an equally fast but more accurate alternative to BLAST for tree-based inference of the evolutionary origin and composition of short sequence reads. We are also actively developing a Web server that offers a freely available service for computing read placements on trees using the EPA.
引用
收藏
页码:291 / 302
页数:12
相关论文
共 44 条
  • [1] Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences
    Ababneh, F
    Jermiin, LS
    Ma, CS
    Robinson, J
    [J]. BIOINFORMATICS, 2006, 22 (10) : 1225 - 1231
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] Berger S.A., 2010, Proceedings of IEEE/ACS International Conference on Computer Systems and Applications (AICCSA-10), P1
  • [4] Brady A, 2009, NAT METHODS, V6, P673, DOI [10.1038/nmeth.1358, 10.1038/NMETH.1358]
  • [5] A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria
    Chakravorty, Soumitesh
    Helb, Danica
    Burday, Michele
    Connell, Nancy
    Alland, David
    [J]. JOURNAL OF MICROBIOLOGICAL METHODS, 2007, 69 (02) : 330 - 339
  • [6] NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes
    DeSantis, T. Z.
    Hugenholtz, P.
    Keller, K.
    Brodie, E. L.
    Larsen, N.
    Piceno, Y. M.
    Phan, R.
    Andersen, G. L.
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W394 - W399
  • [7] Profile hidden Markov models
    Eddy, SR
    [J]. BIOINFORMATICS, 1998, 14 (09) : 755 - 763
  • [8] MUSCLE: multiple sequence alignment with high accuracy and high throughput
    Edgar, RC
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (05) : 1792 - 1797
  • [9] EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH
    FELSENSTEIN, J
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) : 368 - 376
  • [10] FELSENSTEIN J, 1985, EVOLUTION, V39, P783, DOI 10.1111/j.1558-5646.1985.tb00420.x