pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree

被引:736
作者
Matsen, Frederick A. [1 ]
Kodner, Robin B. [2 ,3 ]
Armbrust, E. Virginia [2 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Computat Biol Program, Seattle, WA 98104 USA
[2] Univ Washington, Sch Oceanog, Seattle, WA 98195 USA
[3] Univ Washington, Friday Harbor Labs, Friday Harbor, WA 98250 USA
关键词
DNA-SEQUENCES; METAGENOMIC ANALYSIS; MICROBIAL COMMUNITIES; EVOLUTIONARY TREES; MARINE; PROTEIN; DIVERSITY; GENE; CLASSIFICATION; INFERENCE;
D O I
10.1186/1471-2105-11-538
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. Results: This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. Conclusions: Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service.
引用
收藏
页数:16
相关论文
共 69 条
[1]   Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites [J].
Allman, Elizabeth S. ;
Rhodes, John A. .
MATHEMATICAL BIOSCIENCES, 2008, 211 (01) :18-33
[2]   The identifiability of tree topology for phylogenetic models, including covarion and mixture models [J].
Allman, Elizabeth S. ;
Rhodes, John A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (05) :1101-1113
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]  
[Anonymous], 2004, Inferring phylogenies
[5]  
[Anonymous], 2004, PHYLIP PHYLOGENY INF
[6]  
[Anonymous], NUCL ACIDS RES
[7]  
[Anonymous], 2006, GENETIC ALGORITHM AP
[8]   Microbial communities in acid mine drainage [J].
Baker, BJ ;
Banfield, JF .
FEMS MICROBIOLOGY ECOLOGY, 2003, 44 (02) :139-152
[9]  
BERGER S, 2009, SYS BIOL UNPUB
[10]  
Brady A, 2009, NAT METHODS, V6, P673, DOI [10.1038/nmeth.1358, 10.1038/NMETH.1358]