A unified catalog of 204,938 reference genomes from the human gut microbiome

被引:654
作者
Almeida, Alexandre [1 ,2 ]
Nayfach, Stephen [3 ,4 ]
Boland, Miguel [1 ]
Strozzi, Francesco [5 ]
Beracochea, Martin [1 ]
Shi, Zhou Jason [6 ,7 ]
Pollard, Katherine S. [6 ,7 ,8 ,9 ,10 ,11 ]
Sakharova, Ekaterina [1 ]
Parks, Donovan H. [12 ]
Hugenholtz, Philip [12 ]
Segata, Nicola [13 ]
Kyrpides, Nikos C. [3 ,4 ]
Finn, Robert D. [1 ]
机构
[1] European Bioinformat Inst EMBL EBI, Wellcome Genome Campus, Hinxton, England
[2] Wellcome Sanger Inst, Wellcome Genome Campus, Hinxton, England
[3] US DOE, Joint Genome Inst, Walnut Creek, CA USA
[4] Lawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA USA
[5] Enterome Biosci, Paris, France
[6] Gladstone Inst, San Francisco, CA USA
[7] Chan Zuckerberg Biohub, San Francisco, CA USA
[8] Univ Calif San Francisco, Inst Human Genet, San Francisco, CA 94143 USA
[9] Univ Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94143 USA
[10] Univ Calif San Francisco, Quantitat Biol, San Francisco, CA 94143 USA
[11] Univ Calif San Francisco, Dept Epidemiol & Biostat, San Francisco, CA USA
[12] Univ Queensland, Sch Chem & Mol Biosci, Australian Ctr Ecogen, Brisbane, Qld, Australia
[13] Univ Trento, CIBIO Dept, Trento, Italy
基金
欧洲研究理事会; 英国生物技术与生命科学研究理事会;
关键词
READ ALIGNMENT; ASSEMBLED GENOMES; ANALYSIS RESOURCE; ANNOTATION; BACTERIAL; GENES; VERSATILE; COVERAGE; DATABASE; CULTURE;
D O I
10.1038/s41587-020-0603-3
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome.
引用
收藏
页码:105 / 114
页数:10
相关论文
共 75 条
  • [21] Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper
    Huerta-Cepas, Jaime
    Forslund, Kristoffer
    Coelho, Luis Pedro
    Szklarczyk, Damian
    Jensen, Lars Juhl
    von Mering, Christian
    Bork, Peer
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2017, 34 (08) : 2115 - 2122
  • [22] Structure, function and diversity of the healthy human microbiome
    Huttenhower, Curtis
    Gevers, Dirk
    Knight, Rob
    Abubucker, Sahar
    Badger, Jonathan H.
    Chinwalla, Asif T.
    Creasy, Heather H.
    Earl, Ashlee M.
    FitzGerald, Michael G.
    Fulton, Robert S.
    Giglio, Michelle G.
    Hallsworth-Pepin, Kymberlie
    Lobos, Elizabeth A.
    Madupu, Ramana
    Magrini, Vincent
    Martin, John C.
    Mitreva, Makedonka
    Muzny, Donna M.
    Sodergren, Erica J.
    Versalovic, James
    Wollam, Aye M.
    Worley, Kim C.
    Wortman, Jennifer R.
    Young, Sarah K.
    Zeng, Qiandong
    Aagaard, Kjersti M.
    Abolude, Olukemi O.
    Allen-Vercoe, Emma
    Alm, Eric J.
    Alvarado, Lucia
    Andersen, Gary L.
    Anderson, Scott
    Appelbaum, Elizabeth
    Arachchi, Harindra M.
    Armitage, Gary
    Arze, Cesar A.
    Ayvaz, Tulin
    Baker, Carl C.
    Begg, Lisa
    Belachew, Tsegahiwot
    Bhonagiri, Veena
    Bihan, Monika
    Blaser, Martin J.
    Bloom, Toby
    Bonazzi, Vivien
    Brooks, J. Paul
    Buck, Gregory A.
    Buhay, Christian J.
    Busam, Dana A.
    Campbell, Joseph L.
    [J]. NATURE, 2012, 486 (7402) : 207 - 214
  • [23] Prodigal: prokaryotic gene recognition and translation initiation site identification
    Hyatt, Doug
    Chen, Gwo-Liang
    LoCascio, Philip F.
    Land, Miriam L.
    Larimer, Frank W.
    Hauser, Loren J.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [24] High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries
    Jain, Chirag
    Rodriguez-R, Luis M.
    Phillippy, Adam M.
    Konstantinidis, Konstantinos T.
    Aluru, Srinivas
    [J]. NATURE COMMUNICATIONS, 2018, 9
  • [25] InterProScan 5: genome-scale protein function classification
    Jones, Philip
    Binns, David
    Chang, Hsin-Yu
    Fraser, Matthew
    Li, Weizhong
    McAnulla, Craig
    McWilliam, Hamish
    Maslen, John
    Mitchell, Alex
    Nuka, Gift
    Pesseat, Sebastien
    Quinn, Antony F.
    Sangrador-Vegas, Amaia
    Scheremetjew, Maxim
    Yong, Siew-Yit
    Lopez, Rodrigo
    Hunter, Sarah
    [J]. BIOINFORMATICS, 2014, 30 (09) : 1236 - 1240
  • [26] Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families
    Kalvari, Ioanna
    Argasinska, Joanna
    Quinones-Olvera, Natalia
    Nawrocki, Eric P.
    Rivas, Elena
    Eddy, Sean R.
    Bateman, Alex
    Finn, Robert D.
    Petrov, Anton I.
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) : D335 - D342
  • [27] KEGG: new perspectives on genomes, pathways, diseases and drugs
    Kanehisa, Minoru
    Furumichi, Miho
    Tanabe, Mao
    Sato, Yoko
    Morishima, Kanae
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D353 - D361
  • [28] MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies
    Kang, Dongwan D.
    Li, Feng
    Kirton, Edward
    Thomas, Ashleigh
    Egan, Rob
    An, Hong
    Wang, Zhong
    [J]. PEERJ, 2019, 7
  • [29] Assembly: a resource for assembled genomes at NCBI
    Kitts, Paul A.
    Church, Deanna M.
    Thibaud-Nissen, Francoise
    Choi, Jinna
    Hem, Vichet
    Sapojnikov, Victor
    Smith, Robert G.
    Tatusova, Tatiana
    Xiang, Charlie
    Zherikov, Andrey
    DiCuccio, Michael
    Murphy, Terence D.
    Pruitt, Kim D.
    Kimchi, Avi
    [J]. NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) : D73 - D80
  • [30] Versatile and open software for comparing large genomes
    Kurtz, S
    Phillippy, A
    Delcher, AL
    Smoot, M
    Shumway, M
    Antonescu, C
    Salzberg, SL
    [J]. GENOME BIOLOGY, 2004, 5 (02)