A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments

被引:87
作者
Pan, Shaojun [1 ,2 ]
Zhu, Chengkai [1 ,2 ,3 ]
Zhao, Xing-Ming [1 ,2 ,4 ,5 ]
Coelho, Luis Pedro [1 ,2 ]
机构
[1] Fudan Univ, Inst Sci & Technol Brain Inspired Intelligence, Shanghai, Peoples R China
[2] Minist Educ, Key Lab Computat Neurosci & Brain Inspired Intell, Shanghai, Peoples R China
[3] Fudan Univ, Sch Life Sci, Shanghai, Peoples R China
[4] Fudan Univ, MOE Frontiers Ctr Brain Sci, Shanghai, Peoples R China
[5] Zhangjiang Fudan Int Innovat Ctr, Shanghai, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
INFORMATION; ALGORITHM; ALIGNMENT; CATALOG;
D O I
10.1038/s41467-022-29843-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Here, the authors present SemiBin, a siamese deep neural network framework that incorporates information from reference genomes, able to extract better metagenome-assembled genomes (MAGs) in several host-associated and environmental habitats. Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped together. The most widely-used methods for binning are reference-independent, operating de novo and enable the recovery of genomes from previously unsampled clades. However, they do not leverage the knowledge in existing databases. Here, we introduce SemiBin, an open source tool that uses deep siamese neural networks to implement a semi-supervised approach, i.e. SemiBin exploits the information in reference genomes, while retaining the capability of reconstructing high-quality bins that are outside the reference dataset. Using simulated and real microbiome datasets from several different habitats from GMGCv1 (Global Microbial Gene Catalog), including the human gut, non-human guts, and environmental habitats (ocean and soil), we show that SemiBin outperforms existing state-of-the-art binning methods. In particular, compared to other methods, SemiBin returns more high-quality bins with larger taxonomic diversity, including more distinct genera and species.
引用
收藏
页数:12
相关论文
共 76 条
[1]   Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics (vol 1, pg 72, 2015) [J].
Afshinnekoo, Ebrahim ;
Meydan, Cem ;
Chowdhury, Shanin ;
Jaroudi, Dyala ;
Boyer, Collin ;
Bernstein, Nick ;
Maritz, Julia M. ;
Reeves, Darryl ;
Gandara, Jorge ;
Chhangawala, Sagar ;
Ahsanuddin, Sofia ;
Simmons, Amber ;
Nessel, Timothy ;
Sundaresh, Bharathi ;
Pereira, Elizabeth ;
Jorgensen, Ellen ;
Kolokotronis, Sergios-Orestis ;
Kirchberger, Nell ;
Garcia, Isaac ;
Gandara, David ;
Dhanraj, Sean ;
Nawrin, Tanzina ;
Saletore, Yogesh ;
Alexander, Noah ;
Vijay, Priyanka ;
Henaff, Elizabeth M. ;
Zumbo, Paul ;
Walsh, Michael ;
O'Mullan, Gregory D. ;
Tighe, Scott ;
Dudley, Joel T. ;
Dunaif, Anya ;
Ennis, Sean ;
O'Halloran, Eoghan ;
Magalhaes, Tiago R. ;
Boone, Braden ;
Jones, Angela L. ;
Muth, Theodore R. ;
Paolantonio, Katie Schneider ;
Alter, Elizabeth ;
Schadt, Eric E. ;
Garbarino, Jeanne ;
Prill, Robert J. ;
Carlton, Jane M. ;
Levy, Shawn ;
Mason, Christopher E. .
CELL SYSTEMS, 2015, 1 (01) :97-+
[2]   A unified catalog of 204,938 reference genomes from the human gut microbiome [J].
Almeida, Alexandre ;
Nayfach, Stephen ;
Boland, Miguel ;
Strozzi, Francesco ;
Beracochea, Martin ;
Shi, Zhou Jason ;
Pollard, Katherine S. ;
Sakharova, Ekaterina ;
Parks, Donovan H. ;
Hugenholtz, Philip ;
Segata, Nicola ;
Kyrpides, Nikos C. ;
Finn, Robert D. .
NATURE BIOTECHNOLOGY, 2021, 39 (01) :105-114
[3]   A new genomic blueprint of the human gut microbiota [J].
Almeida, Alexandre ;
Mitchell, Alex L. ;
Boland, Miguel ;
Forster, Samuel C. ;
Gloor, Gregory B. ;
Tarkowska, Aleksandra ;
Lawley, Trevor D. ;
Finn, Robert D. .
NATURE, 2019, 568 (7753) :499-+
[4]   Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea [J].
Bowers, Robert M. ;
Kyrpides, Nikos C. ;
Stepanauskas, Ramunas ;
Harmon-Smith, Miranda ;
Doud, Devin ;
Reddy, T. B. K. ;
Schulz, Frederik ;
Jarett, Jessica ;
Rivers, Adam R. ;
Eloe-Fadrosh, Emiley A. ;
Tringe, Susannah G. ;
Ivanova, Natalia N. ;
Copeland, Alex ;
Clum, Alicia ;
Becraft, Eric D. ;
Malmstrom, Rex R. ;
Birren, Bruce ;
Podar, Mircea ;
Bork, Peer ;
Weinstock, George M. ;
Garrity, George M. ;
Dodsworth, Jeremy A. ;
Yooseph, Shibu ;
Sutton, Granger ;
Gloeckner, Frank O. ;
Gilbert, Jack A. ;
Nelson, William C. ;
Hallam, Steven J. ;
Jungbluth, Sean P. ;
Ettema, Thijs J. G. ;
Tighe, Scott ;
Konstantinidis, Konstantinos T. ;
Liu, Wen-Tso ;
Baker, Brett J. ;
Rattei, Thomas ;
Eisen, Jonathan A. ;
Hedlund, Brian ;
McMahon, Katherine D. ;
Fierer, Noah ;
Knight, Rob ;
Finn, Rob ;
Cochrane, Guy ;
Karsch-Mizrachi, Ilene ;
Tyson, Gene W. ;
Rinke, Christian ;
Lapidus, Alla ;
Meyer, Folker ;
Yilmaz, Pelin ;
Parks, Donovan H. ;
Eren, A. M. .
NATURE BIOTECHNOLOGY, 2017, 35 (08) :725-731
[5]   Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity [J].
Brown, C. Titus ;
Moritz, Dominik ;
O'Brien, Michael P. ;
Reidl, Felix ;
Reiter, Taylor ;
Sullivan, Blair D. .
GENOME BIOLOGY, 2020, 21 (01)
[6]   Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary [J].
Brynildsrud, Ola ;
Bohlin, Jon ;
Scheffer, Lonneke ;
Eldholm, Vegard .
GENOME BIOLOGY, 2016, 17
[7]   Forest harvesting reduces the soil metagenomic potential for biomass decomposition [J].
Cardenas, Erick ;
Kranabetter, J. M. ;
Hope, Graeme ;
Maas, Kendra R. ;
Hallam, Steven ;
Mohn, William W. .
ISME JOURNAL, 2015, 9 (11) :2465-2476
[8]   GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database [J].
Chaumeil, Pierre-Alain ;
Mussig, Aaron J. ;
Hugenholtz, Philip ;
Parks, Donovan H. .
BIOINFORMATICS, 2020, 36 (06) :1925-1927
[9]   Substrate-induced transmembrane signaling in the cobalamin transporter BtuB [J].
Chimento, DP ;
Mohanty, AK ;
Kadner, RJ ;
Wiener, MC .
NATURE STRUCTURAL BIOLOGY, 2003, 10 (05) :394-401
[10]   Towards the biogeography of prokaryotic genes [J].
Coelho, Luis Pedro ;
Alves, Renato ;
del Rio, Alvaro Rodriguez ;
Myers, Pernille Neve ;
Cantalapiedra, Carlos P. ;
Giner-Lamia, Joaquin ;
Schmidt, Thomas Sebastian ;
Mende, Daniel R. ;
Orakov, Askarbek ;
Letunic, Ivica ;
Hildebrand, Falk ;
Van Rossum, Thea ;
Forslund, Sofia K. ;
Khedkar, Supriya ;
Maistrenko, Oleksandr M. ;
Pan, Shaojun ;
Jia, Longhao ;
Ferretti, Pamela ;
Sunagawa, Shinichi ;
Zhao, Xing-Ming ;
Nielsen, Henrik Bjorn ;
Huerta-Cepas, Jaime ;
Bork, Peer .
NATURE, 2022, 601 (7892) :252-+