PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples

被引:164
作者
Hong, Changjin [1 ]
Manimaran, Solaiappan [1 ]
Shen, Ying [1 ]
Perez-Rogers, Joseph F. [1 ,2 ]
Byrd, Allyson L. [2 ]
Castro-Nallar, Eduardo [3 ]
Crandall, Keith A. [3 ]
Johnson, William Evan [1 ,2 ]
机构
[1] Boston Univ, Sch Med, Computat Biomed, Boston, MA 02118 USA
[2] Boston Univ, Bioinformat Program, Boston, MA 02125 USA
[3] George Washington Univ, Computat Biol Inst, Ashburn, VA 20147 USA
来源
MICROBIOME | 2014年 / 2卷
基金
美国国家科学基金会;
关键词
PHYLOGENETIC CLASSIFICATION; DISCOVERY;
D O I
10.1186/2049-2618-2-33
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Background: Recent innovations in sequencing technologies have provided researchers with the ability to rapidly characterize the microbial content of an environmental or clinical sample with unprecedented resolution. These approaches are producing a wealth of information that is providing novel insights into the microbial ecology of the environment and human health. However, these sequencing-based approaches produce large and complex datasets that require efficient and sensitive computational analysis workflows. Many recent tools for analyzing metagenomic-sequencing data have emerged, however, these approaches often suffer from issues of specificity, efficiency, and typically do not include a complete metagenomic analysis framework. Results: We present PathoScope 2.0, a complete bioinformatics framework for rapidly and accurately quantifying the proportions of reads from individual microbial strains present in metagenomic sequencing data from environmental or clinical samples. The pipeline performs all necessary computational analysis steps; including reference genome library extraction and indexing, read quality control and alignment, strain identification, and summarization and annotation of results. We rigorously evaluated PathoScope 2.0 using simulated data and data from the 2011 outbreak of Shiga-toxigenic Escherichia coli O104:H4. Conclusions: The results show that PathoScope 2.0 is a complete, highly sensitive, and efficient approach for metagenomic analysis that outperforms alternative approaches in scope, speed, and accuracy. The PathoScope 2.0 pipeline software is freely available for download at: http://sourceforge.net/projects/pathoscope/.
引用
收藏
页数:15
相关论文
共 22 条
  • [1] A weighted strategy to handle likelihood uncertainty in Bayesian inference
    Agostinelli, Claudio
    Greco, Luca
    [J]. COMPUTATIONAL STATISTICS, 2013, 28 (01) : 319 - 339
  • [2] Scalable metagenomic taxonomy classification using a reference genome database
    Ames, Sasha K.
    Hysom, David A.
    Gardner, Shea N.
    Lloyd, G. Scott
    Gokhale, Maya B.
    Allen, Jonathan E.
    [J]. BIOINFORMATICS, 2013, 29 (18) : 2253 - 2260
  • [3] Rapid identification of non-human sequences in high-throughput sequencing datasets
    Bhaduri, Aparna
    Qu, Kun
    Lee, Carolyn S.
    Ungewickell, Alexander
    Khavari, Paul A.
    [J]. BIOINFORMATICS, 2012, 28 (08) : 1174 - 1175
  • [4] Sequence-Based Discovery of Bradyrhizobium enterica in Cord Colitis Syndrome
    Bhatt, Ami S.
    Freeman, Samuel S.
    Herrera, Alex F.
    Pedamallu, Chandra Sekhar
    Gevers, Dirk
    Duke, Fujiko
    Jung, Joonil
    Michaud, Monia
    Walker, Bruce J.
    Young, Sarah
    Earl, Ashlee M.
    Kostic, Aleksander D.
    Ojesina, Akinyemi I.
    Hasserjian, Robert
    Ballen, Karen K.
    Chen, Yi-Bin
    Hobbs, Gabriela
    Antin, Joseph H.
    Soiffer, Robert J.
    Baden, Lindsey R.
    Garrett, Wendy S.
    Hornick, Jason L.
    Marty, Francisco M.
    Meyerson, Matthew
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2013, 369 (06) : 517 - 528
  • [5] Brady A, 2009, NAT METHODS, V6, P673, DOI [10.1038/nmeth.1358, 10.1038/NMETH.1358]
  • [6] Cross-Species Transmission of a Novel Adenovirus Associated with a Fulminant Pneumonia Outbreak in a New World Monkey Colony
    Chen, Eunice C.
    Yagi, Shigeo
    Kelly, Kristi R.
    Mendoza, Sally P.
    Maninger, Nicole
    Rosenthal, Ann
    Spinner, Abigail
    Bales, Karen L.
    Schnurr, David P.
    Lerche, Nicholas W.
    Chiu, Charles Y.
    [J]. PLOS PATHOGENS, 2011, 7 (07)
  • [7] Pathoscope: Species identification and strain attribution with unassembled sequencing data
    Francis, Owen E.
    Bendall, Matthew
    Manimaran, Solaiappan
    Hong, Changjin
    Clement, Nathan L.
    Castro-Nallar, Eduardo
    Snell, Quinn
    Schaalje, G. Bruce
    Clement, Mark J.
    Crandall, Keith A.
    Johnson, W. Evan
    [J]. GENOME RESEARCH, 2013, 23 (10) : 1721 - 1729
  • [8] High-speed microbial community profiling
    Haft, Daniel H.
    Tovchigrechko, Andrey
    [J]. NATURE METHODS, 2012, 9 (08) : 793 - 794
  • [9] Holtgrewe M, 2010, Mason - A Read Simulator for Second Generation Sequencing Data
  • [10] MEGAN analysis of metagenomic data
    Huson, Daniel H.
    Auch, Alexander F.
    Qi, Ji
    Schuster, Stephan C.
    [J]. GENOME RESEARCH, 2007, 17 (03) : 377 - 386