A unified catalog of 204,938 reference genomes from the human gut microbiome

被引:729
作者
Almeida, Alexandre [1 ,2 ]
Nayfach, Stephen [3 ,4 ]
Boland, Miguel [1 ]
Strozzi, Francesco [5 ]
Beracochea, Martin [1 ]
Shi, Zhou Jason [6 ,7 ]
Pollard, Katherine S. [6 ,7 ,8 ,9 ,10 ,11 ]
Sakharova, Ekaterina [1 ]
Parks, Donovan H. [12 ]
Hugenholtz, Philip [12 ]
Segata, Nicola [13 ]
Kyrpides, Nikos C. [3 ,4 ]
Finn, Robert D. [1 ]
机构
[1] European Bioinformat Inst EMBL EBI, Wellcome Genome Campus, Hinxton, England
[2] Wellcome Sanger Inst, Wellcome Genome Campus, Hinxton, England
[3] US DOE, Joint Genome Inst, Walnut Creek, CA USA
[4] Lawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA USA
[5] Enterome Biosci, Paris, France
[6] Gladstone Inst, San Francisco, CA USA
[7] Chan Zuckerberg Biohub, San Francisco, CA USA
[8] Univ Calif San Francisco, Inst Human Genet, San Francisco, CA 94143 USA
[9] Univ Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94143 USA
[10] Univ Calif San Francisco, Quantitat Biol, San Francisco, CA 94143 USA
[11] Univ Calif San Francisco, Dept Epidemiol & Biostat, San Francisco, CA USA
[12] Univ Queensland, Sch Chem & Mol Biosci, Australian Ctr Ecogen, Brisbane, Qld, Australia
[13] Univ Trento, CIBIO Dept, Trento, Italy
基金
欧洲研究理事会; 英国生物技术与生命科学研究理事会;
关键词
READ ALIGNMENT; ASSEMBLED GENOMES; ANALYSIS RESOURCE; ANNOTATION; BACTERIAL; GENES; VERSATILE; COVERAGE; DATABASE; CULTURE;
D O I
10.1038/s41587-020-0603-3
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome.
引用
收藏
页码:105 / 114
页数:10
相关论文
共 75 条
[21]   Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper [J].
Huerta-Cepas, Jaime ;
Forslund, Kristoffer ;
Coelho, Luis Pedro ;
Szklarczyk, Damian ;
Jensen, Lars Juhl ;
von Mering, Christian ;
Bork, Peer .
MOLECULAR BIOLOGY AND EVOLUTION, 2017, 34 (08) :2115-2122
[22]   Structure, function and diversity of the healthy human microbiome [J].
Huttenhower, Curtis ;
Gevers, Dirk ;
Knight, Rob ;
Abubucker, Sahar ;
Badger, Jonathan H. ;
Chinwalla, Asif T. ;
Creasy, Heather H. ;
Earl, Ashlee M. ;
FitzGerald, Michael G. ;
Fulton, Robert S. ;
Giglio, Michelle G. ;
Hallsworth-Pepin, Kymberlie ;
Lobos, Elizabeth A. ;
Madupu, Ramana ;
Magrini, Vincent ;
Martin, John C. ;
Mitreva, Makedonka ;
Muzny, Donna M. ;
Sodergren, Erica J. ;
Versalovic, James ;
Wollam, Aye M. ;
Worley, Kim C. ;
Wortman, Jennifer R. ;
Young, Sarah K. ;
Zeng, Qiandong ;
Aagaard, Kjersti M. ;
Abolude, Olukemi O. ;
Allen-Vercoe, Emma ;
Alm, Eric J. ;
Alvarado, Lucia ;
Andersen, Gary L. ;
Anderson, Scott ;
Appelbaum, Elizabeth ;
Arachchi, Harindra M. ;
Armitage, Gary ;
Arze, Cesar A. ;
Ayvaz, Tulin ;
Baker, Carl C. ;
Begg, Lisa ;
Belachew, Tsegahiwot ;
Bhonagiri, Veena ;
Bihan, Monika ;
Blaser, Martin J. ;
Bloom, Toby ;
Bonazzi, Vivien ;
Brooks, J. Paul ;
Buck, Gregory A. ;
Buhay, Christian J. ;
Busam, Dana A. ;
Campbell, Joseph L. .
NATURE, 2012, 486 (7402) :207-214
[23]   Prodigal: prokaryotic gene recognition and translation initiation site identification [J].
Hyatt, Doug ;
Chen, Gwo-Liang ;
LoCascio, Philip F. ;
Land, Miriam L. ;
Larimer, Frank W. ;
Hauser, Loren J. .
BMC BIOINFORMATICS, 2010, 11
[24]   High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries [J].
Jain, Chirag ;
Rodriguez-R, Luis M. ;
Phillippy, Adam M. ;
Konstantinidis, Konstantinos T. ;
Aluru, Srinivas .
NATURE COMMUNICATIONS, 2018, 9
[25]   InterProScan 5: genome-scale protein function classification [J].
Jones, Philip ;
Binns, David ;
Chang, Hsin-Yu ;
Fraser, Matthew ;
Li, Weizhong ;
McAnulla, Craig ;
McWilliam, Hamish ;
Maslen, John ;
Mitchell, Alex ;
Nuka, Gift ;
Pesseat, Sebastien ;
Quinn, Antony F. ;
Sangrador-Vegas, Amaia ;
Scheremetjew, Maxim ;
Yong, Siew-Yit ;
Lopez, Rodrigo ;
Hunter, Sarah .
BIOINFORMATICS, 2014, 30 (09) :1236-1240
[26]   Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families [J].
Kalvari, Ioanna ;
Argasinska, Joanna ;
Quinones-Olvera, Natalia ;
Nawrocki, Eric P. ;
Rivas, Elena ;
Eddy, Sean R. ;
Bateman, Alex ;
Finn, Robert D. ;
Petrov, Anton I. .
NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) :D335-D342
[27]   KEGG: new perspectives on genomes, pathways, diseases and drugs [J].
Kanehisa, Minoru ;
Furumichi, Miho ;
Tanabe, Mao ;
Sato, Yoko ;
Morishima, Kanae .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D353-D361
[28]   MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies [J].
Kang, Dongwan D. ;
Li, Feng ;
Kirton, Edward ;
Thomas, Ashleigh ;
Egan, Rob ;
An, Hong ;
Wang, Zhong .
PEERJ, 2019, 7
[29]   Assembly: a resource for assembled genomes at NCBI [J].
Kitts, Paul A. ;
Church, Deanna M. ;
Thibaud-Nissen, Francoise ;
Choi, Jinna ;
Hem, Vichet ;
Sapojnikov, Victor ;
Smith, Robert G. ;
Tatusova, Tatiana ;
Xiang, Charlie ;
Zherikov, Andrey ;
DiCuccio, Michael ;
Murphy, Terence D. ;
Pruitt, Kim D. ;
Kimchi, Avi .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D73-D80
[30]   Versatile and open software for comparing large genomes [J].
Kurtz, S ;
Phillippy, A ;
Delcher, AL ;
Smoot, M ;
Shumway, M ;
Antonescu, C ;
Salzberg, SL .
GENOME BIOLOGY, 2004, 5 (02)