Centrifuge: rapid and sensitive classification of metagenomic sequences

被引:863
作者
Kim, Daehwan [1 ]
Song, Li [1 ,2 ]
Breitwieser, Florian P. [1 ]
Salzberg, Steven L. [1 ,2 ,3 ,4 ]
机构
[1] Johns Hopkins Univ, Sch Med, McKusick Nathans Inst Genet Med, Ctr Computat Biol, Baltimore, MD 21205 USA
[2] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[3] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD 21205 USA
[4] Johns Hopkins Univ, Dept Biostat, Baltimore, MD 21205 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
BURROWS-WHEELER TRANSFORM; READ ALIGNMENT; RNA-SEQ; DNA-SEQUENCES; QUANTIFICATION; TRANSMISSION; ULTRAFAST; GENOME;
D O I
10.1101/gr.210641.116
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space.
引用
收藏
页码:1721 / 1729
页数:9
相关论文
共 31 条
[1]   PHYLOGENETIC IDENTIFICATION AND IN-SITU DETECTION OF INDIVIDUAL MICROBIAL-CELLS WITHOUT CULTIVATION [J].
AMANN, RI ;
LUDWIG, W ;
SCHLEIFER, KH .
MICROBIOLOGICAL REVIEWS, 1995, 59 (01) :143-169
[2]   Emergence of Zaire Ebola Virus Disease in Guinea [J].
Baize, Sylvain ;
Pannetier, Delphine ;
Oestereich, Lisa ;
Rieger, Toni ;
Koivogui, Lamine ;
Magassouba, N'Faly ;
Soropogui, Barre ;
Sow, Mamadou Saliou ;
Keita, Sakoba ;
De Clerck, Hilde ;
Tiffany, Amanda ;
Dominguez, Gemma ;
Loua, Mathieu ;
Traore, Alexis ;
Kolie, Moussa ;
Malano, Emmanuel Roland ;
Heleze, Emmanuel ;
Bocquin, Anne ;
Mely, Stephane ;
Raoul, Herve ;
Caro, Valerie ;
Cadar, Daniel ;
Gabriel, Martin ;
Pahlmann, Meike ;
Tappe, Dennis ;
Schmidt-Chanasit, Jonas ;
Impouma, Benido ;
Diallo, Abdoul Karim ;
Formenty, Pierre ;
Van Herp, Michel ;
Guenther, Stephan .
NEW ENGLAND JOURNAL OF MEDICINE, 2014, 371 (15) :1418-1425
[3]   PhymmBL expanded: confidence scores, custom databases, parallelization and more [J].
Brady, Arthur ;
Salzberg, Steven .
NATURE METHODS, 2011, 8 (05) :367-367
[4]  
Brady A, 2009, NAT METHODS, V6, P673, DOI [10.1038/nmeth.1358, 10.1038/NMETH.1358]
[5]  
Burrows M, 1994, BLOCK SORTING LOSSLE
[6]   Type material in the NCBI Taxonomy Database [J].
Federhen, Scott .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D1086-D1098
[7]  
Ferragina P., 2000, P 41 IEEE S FDN COMP
[8]   Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak [J].
Gire, Stephen K. ;
Goba, Augustine ;
Andersen, Kristian G. ;
Sealfon, Rachel S. G. ;
Park, Daniel J. ;
Kanneh, Lansana ;
Jalloh, Simbirie ;
Momoh, Mambu ;
Fullah, Mohamed ;
Dudas, Gytis ;
Wohl, Shirlee ;
Moses, Lina M. ;
Yozwiak, Nathan L. ;
Winnicki, Sarah ;
Matranga, Christian B. ;
Malboeuf, Christine M. ;
Qu, James ;
Gladden, Adrianne D. ;
Schaffner, Stephen F. ;
Yang, Xiao ;
Jiang, Pan-Pan ;
Nekoui, Mahan ;
Colubri, Andres ;
Coomber, Moinya Ruth ;
Fonnie, Mbalu ;
Moigboi, Alex ;
Gbakie, Michael ;
Kamara, Fatima K. ;
Tucker, Veronica ;
Konuwa, Edwin ;
Saffa, Sidiki ;
Sellu, Josephine ;
Jalloh, Abdul Azziz ;
Kovoma, Alice ;
Koninga, James ;
Mustapha, Ibrahim ;
Kargbo, Kandeh ;
Foday, Momoh ;
Yillah, Mohamed ;
Kanneh, Franklyn ;
Robert, Willie ;
Massally, James L. B. ;
Chapman, Sinead B. ;
Bochicchio, James ;
Murphy, Cheryl ;
Nusbaum, Chad ;
Young, Sarah ;
Birren, BruceW. ;
Grant, Donald S. ;
Scheiffelin, John S. .
SCIENCE, 2014, 345 (6202) :1369-1372
[9]  
Jain M, 2015, NAT METHODS, V12, P351, DOI [10.1038/NMETH.3290, 10.1038/nmeth.3290]
[10]   Tapping into microbial diversity [J].
Keller, M ;
Zengler, K .
NATURE REVIEWS MICROBIOLOGY, 2004, 2 (02) :141-150