Using populations of human and microbial genomes for organism detection in metagenomes

被引:24
作者
Ames, Sasha K. [1 ]
Gardner, Shea N. [2 ]
Marti, Jose Manuel [3 ]
Slezak, Tom R. [2 ]
Gokhale, Maya B. [1 ]
Allen, Jonathan E. [2 ]
机构
[1] Lawrence Livermore Natl Lab, Ctr Appl Sci Comp, Livermore, CA 94550 USA
[2] Lawrence Livermore Natl Lab, Global Secur Comp Applicat Div, Livermore, CA 94550 USA
[3] UVEG, CSIC, Inst Fis Corpuscular, E-46980 Valencia, Spain
关键词
PATHOGEN IDENTIFICATION; CLASSIFICATION; ALIGNMENT; CONTAMINATION; ACCURATE; SAMPLES;
D O I
10.1101/gr.184879.114
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. Left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.
引用
收藏
页码:1056 / 1067
页数:12
相关论文
共 49 条
[1]  
Abecasis G.R., 2012, NATURE, V491, P56, DOI DOI 10.1038/nature11632
[2]   Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes [J].
Albertsen, Mads ;
Hugenholtz, Philip ;
Skarshewski, Adam ;
Nielsen, Kare L. ;
Tyson, Gene W. ;
Nielsen, Per H. .
NATURE BIOTECHNOLOGY, 2013, 31 (06) :533-+
[3]   DNA signatures for detecting genetic engineering in bacteria [J].
Allen, Jonathan E. ;
Gardner, Shea N. ;
Slezak, Tom R. .
GENOME BIOLOGY, 2008, 9 (03)
[4]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[5]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[6]   Design and Optimization of a Metagenomics Analysis Workflow for NVRAM [J].
Ames, Sasha ;
Allen, Jonathan E. ;
Hysom, David A. ;
Lloyd, G. Scott ;
Gokhale, Maya B. .
PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, :557-566
[7]   Scalable metagenomic taxonomy classification using a reference genome database [J].
Ames, Sasha K. ;
Hysom, David A. ;
Gardner, Shea N. ;
Lloyd, G. Scott ;
Gokhale, Maya B. ;
Allen, Jonathan E. .
BIOINFORMATICS, 2013, 29 (18) :2253-2260
[8]  
[Anonymous], BIORXIV
[9]  
[Anonymous], SRA HDB
[10]  
Aronesty E., 2013, OPEN BIOINFORM J, V7, DOI 10.2174/1875036201307010001