Large-scale 16S gene assembly using metagenomics shotgun sequences

被引:10
作者
Zeng, Feng [1 ]
Wang, Zicheng [2 ,3 ]
Wang, Ying [1 ]
Zhou, Jizhong [4 ,5 ,6 ,7 ]
Chen, Ting [2 ,8 ,9 ]
机构
[1] Xiamen Univ, Dept Automat, Xiamen 361005, Fujian, Peoples R China
[2] Tsinghua Univ, Bioinformat Div, TNLIST, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[4] Univ Oklahoma, Inst Environm Genom, Norman, OK 73019 USA
[5] Univ Oklahoma, Dept Microbiol & Plant Biol, Norman, OK 73019 USA
[6] Tsinghua Univ, Sch Environm, State Key Joint Lab Environm Simulat & Pollut Con, Beijing 100084, Peoples R China
[7] Lawrence Berkeley Natl Lab, Div Earth Sci, Berkeley, CA 94270 USA
[8] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[9] Univ Southern Calif, Program Computat Biol & Bioinformat, Los Angeles, CA 90089 USA
基金
中国国家自然科学基金;
关键词
ALIGNMENT; RECONSTRUCTION; INFERENCE; DATABASE; GENOMES;
D O I
10.1093/bioinformatics/btx018
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Combining a 16S rRNA (16S) gene database with metagenomic shotgun sequences promises unbiased identification of known and novel microbes. Results: To achieve this, we herein report reference-based ribosome assembly (RAMBL), a computational pipeline, which integrates taxonomic tree search and Dirichlet process clustering to reconstruct full-length 16S gene sequences from metagenomic sequencing data with high accuracy. By benchmarking against the synthetic and real shotgun sequences, we demonstrated that full-length 16S gene assemblies of RAMBL were a good proxy for known and putative microbes, including Candidate Phyla Radiation. We found that 30-40% of bacteria genera in the terrestrial and intestinal biomes have no closely related genome sequences. We also observed that RAMBL was able to generate a more accurate determination of environmental microbial diversity and yield better disease classification, suggesting that full-length 16S gene assemblies are a powerful alternative to marker gene set and 16S short reads. RAMBL first realizes the access to full-length 16S gene sequences in the near-terabase-scale metagenomic shotgun sequences, which markedly improve metagenomic data analysis and interpretation. Supplementary information: Supplementary data are available at Bioinformatics online.
引用
收藏
页码:1447 / 1456
页数:10
相关论文
共 40 条
[1]  
[Anonymous], 2010, Proceedings of the 27th International Conference on Machine Learning (ICML-10)
[2]   QIIME allows analysis of high-throughput community sequencing data [J].
Caporaso, J. Gregory ;
Kuczynski, Justin ;
Stombaugh, Jesse ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
Costello, Elizabeth K. ;
Fierer, Noah ;
Pena, Antonio Gonzalez ;
Goodrich, Julia K. ;
Gordon, Jeffrey I. ;
Huttley, Gavin A. ;
Kelley, Scott T. ;
Knights, Dan ;
Koenig, Jeremy E. ;
Ley, Ruth E. ;
Lozupone, Catherine A. ;
McDonald, Daniel ;
Muegge, Brian D. ;
Pirrung, Meg ;
Reeder, Jens ;
Sevinsky, Joel R. ;
Tumbaugh, Peter J. ;
Walters, William A. ;
Widmann, Jeremy ;
Yatsunenko, Tanya ;
Zaneveld, Jesse ;
Knight, Rob .
NATURE METHODS, 2010, 7 (05) :335-336
[3]   A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria [J].
Chakravorty, Soumitesh ;
Helb, Danica ;
Burday, Michele ;
Connell, Nancy ;
Alland, David .
JOURNAL OF MICROBIOLOGICAL METHODS, 2007, 69 (02) :330-339
[4]   Ribosomal Database Project: data and tools for high throughput rRNA analysis [J].
Cole, James R. ;
Wang, Qiong ;
Fish, Jordan A. ;
Chai, Benli ;
McGarrell, Donna M. ;
Sun, Yanni ;
Brown, C. Titus ;
Porras-Alfaro, Andrea ;
Kuske, Cheryl R. ;
Tiedje, James M. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D633-D642
[5]   Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB [J].
DeSantis, T. Z. ;
Hugenholtz, P. ;
Larsen, N. ;
Rojas, M. ;
Brodie, E. L. ;
Keller, K. ;
Huber, T. ;
Dalevi, D. ;
Hu, P. ;
Andersen, G. L. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2006, 72 (07) :5069-5072
[6]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[7]  
Eloe-Fadrosh EA, 2016, NAT MICROBIOL, V1, DOI [10.1038/nmicrobiol.2015.32, 10.1038/NMICROBIOL.2015.32]
[8]   The diversity and biogeography of soil bacterial communities [J].
Fierer, N ;
Jackson, RB .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (03) :626-631
[9]   Cross-biome metagenomic analyses of soil microbial communities and their functional attributes [J].
Fierer, Noah ;
Leff, Jonathan W. ;
Adams, Byron J. ;
Nielsen, Uffe N. ;
Bates, Scott Thomas ;
Lauber, Christian L. ;
Owens, Sarah ;
Gilbert, Jack A. ;
Wall, Diana H. ;
Caporaso, J. Gregory .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (52) :21390-21395
[10]   High-Definition Reconstruction of Clonal Composition in Cancer [J].
Fischer, Andrej ;
Vazquez-Garcia, Ignacio ;
Illingworth, Christopher J. R. ;
Mustonen, Ville .
CELL REPORTS, 2014, 7 (05) :1740-1752