metaFlye: scalable long-read metagenome assembly using repeat graphs

被引:512
作者
Kolmogorov, Mikhail [1 ]
Bickhart, Derek M. [2 ]
Behsaz, Bahar [3 ]
Gurevich, Alexey [4 ]
Rayko, Mikhail [4 ]
Shin, Sung Bong [5 ]
Kuhn, Kristen [5 ]
Yuan, Jeffrey [3 ]
Polevikov, Evgeny [4 ,6 ]
Smith, Timothy P. L. [5 ]
Pevzner, Pavel A. [1 ,7 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
[2] USDA, Cell Wall Biol & Utilizat Lab, Dairy Forage Res Ctr, Madison, WI USA
[3] Univ Calif San Diego, Grad Program Bioinformat & Syst Biol, San Diego, CA 92103 USA
[4] St Petersburg State Univ, Ctr Algorithm Biotechnol, St Petersburg, Russia
[5] USDA ARS, Meat Anim Res Ctr, Clay Ctr, NE 68933 USA
[6] Bioinformat Inst, St Petersburg, Russia
[7] Univ Calif San Diego, Ctr Microbiome Innovat, San Diego, CA 92103 USA
基金
俄罗斯科学基金会; 美国国家卫生研究院;
关键词
HUMAN GENOME; IDENTIFICATION; REVEALS;
D O I
10.1038/s41592-020-00971-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Long-read sequencing technologies have substantially improved the assemblies of many isolate bacterial genomes as compared to fragmented short-read assemblies. However, assembling complex metagenomic datasets remains difficult even for state-of-the-art long-read assemblers. Here we present metaFlye, which addresses important long-read metagenomic assembly challenges, such as uneven bacterial composition and intra-species heterogeneity. First, we benchmarked metaFlye using simulated and mock bacterial communities and show that it consistently produces assemblies with better completeness and contiguity than state-of-the-art long-read assemblers. Second, we performed long-read sequencing of the sheep microbiome and applied metaFlye to reconstruct 63 complete or nearly complete bacterial genomes within single contigs. Finally, we show that long-read assembly of human microbiomes enables the discovery of full-length biosynthetic gene clusters that encode biomedically important natural products. Long-read metagenomics offers a valuable approach for profiling bacterial communities. This work presents a long-read assembler, metaFlye, that specifically addresses the challenges of assembling metagenomes.
引用
收藏
页码:1103 / +
页数:18
相关论文
共 65 条
[1]  
[Anonymous], 2014, P ACM EUROSYS
[2]   Plasmid detection and assembly in genomic and metagenomic data sets [J].
Antipov, Dmitry ;
Raiko, Mikhail ;
Lapidus, Alla ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2019, 29 (06) :961-968
[3]   Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data [J].
Arumugam, Krithika ;
Bagci, Caner ;
Bessarab, Irina ;
Beier, Sina ;
Buchfink, Benjamin ;
Gorska, Anna ;
Qiu, Guanglei ;
Huson, Daniel H. ;
Williams, Rohan B. H. .
MICROBIOME, 2019, 7
[4]   MosaicFlye: Resolving Long Mosaic Repeats Using Long Reads [J].
Bankevich, Anton ;
Pevzner, Pavel .
RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2020, 2020, 12074 :226-228
[5]   UniProt: a hub for protein information [J].
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Apweiler, Rolf ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Gane, Paul ;
Cas-tro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightin-gale, Andrew ;
Orchard, Sandra ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier ;
Zellner, Hermann ;
Cowley, Andrew ;
Figueira, Luis ;
Li, Weizhong ;
McWilliam, Hamish .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D204-D212
[6]   De Novo Peptide Sequencing Reveals Many Cyclopeptides in the Human Gut and Other Environments [J].
Behsaz, Bahar ;
Mohimani, Hosein ;
Gurevich, Alexey ;
Prjibelski, Andrey ;
Fisher, Mark ;
Vargas, Fernando ;
Smarr, Larry ;
Dorrestein, Pieter C. ;
Mylne, Joshua S. ;
Pevzner, Pavel A. .
CELL SYSTEMS, 2020, 10 (01) :99-+
[7]   Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes [J].
Bertrand, Denis ;
Shaw, Jim ;
Kalathiyappan, Manesh ;
Ng, Amanda Hui Qi ;
Kumar, M. Senthil ;
Li, Chenhao ;
Dvornicic, Mirta ;
Soldo, Janja Paliska ;
Koh, Jia Yu ;
Tong, Chengxuan ;
Ng, Oon Tek ;
Barkham, Timothy ;
Young, Barnaby ;
Marimuthu, Kalisvar ;
Chng, Kern Rei ;
Sikic, Mile ;
Nagarajan, Niranjan .
NATURE BIOTECHNOLOGY, 2019, 37 (08) :937-+
[8]   Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation [J].
Bickhart, Derek M. ;
Watson, Mick ;
Koren, Sergey ;
Panke-Buisse, Kevin ;
Cersosimo, Laura M. ;
Press, Maximilian O. ;
Van Tassell, Curtis P. ;
Van Kessel, Jo Ann S. ;
Haley, Bradd J. ;
Kim, Seon Woo ;
Heiner, Cheryl ;
Suen, Garret ;
Bakshy, Kiranmayee ;
Liachko, Ivan ;
Sullivan, Shawn T. ;
Myer, Phillip R. ;
Ghurye, Jay ;
Pop, Mihai ;
Weimer, Paul J. ;
Phillippy, Adam M. ;
Smith, Timothy P. L. .
GENOME BIOLOGY, 2019, 20 (01)
[9]   Linear-time superbubble identification algorithm for genome assembly [J].
Brankovic, Ljiljana ;
Iliopoulos, Costas S. ;
Kundu, Ritu ;
Mohamed, Manal ;
Pissis, Solon P. ;
Vayani, Fatima .
THEORETICAL COMPUTER SCIENCE, 2016, 609 :374-383
[10]   Fast and sensitive protein alignment using DIAMOND [J].
Buchfink, Benjamin ;
Xie, Chao ;
Huson, Daniel H. .
NATURE METHODS, 2015, 12 (01) :59-60