Identification of mobile genetic elements with geNomad

被引:248
作者
Camargo, Antonio Pedro [1 ]
Roux, Simon [1 ]
Schulz, Frederik [1 ]
Babinski, Michal [2 ]
Xu, Yan [2 ]
Hu, Bin [2 ]
Chain, Patrick S. G. [2 ]
Nayfach, Stephen [1 ]
Kyrpides, Nikos C. [1 ]
机构
[1] Lawrence Berkeley Natl Lab, DOE Joint Genome Inst, Berkeley, CA 94720 USA
[2] Los Alamos Natl Lab, Biosci Div, Los Alamos, NM USA
关键词
BACTERIAL; VIRUSES; DATABASE;
D O I
10.1038/s41587-023-01953-y
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a dataset of more than 200,000 marker protein profiles to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks, geNomad achieved high classification performance for diverse plasmids and viruses (Matthews correlation coefficient of 77.8% and 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad's speed and scalability, we processed over 2.7 trillion base pairs of sequencing data, leading to the discovery of millions of viruses and plasmids that are available through the IMG/VR and IMG/PR databases. geNomad is available at https://portal.nersc.gov/genomad. geNomad identifies mobile genetic elements in sequencing data.
引用
收藏
页码:1303 / 1312
页数:23
相关论文
共 84 条
[1]  
Alexander H, 2022, bioRxiv, DOI [10.1101/2021.07.25.453713, 10.1101/2021.07.25.453713, DOI 10.1101/2021.07.25.453713, 10.1101/2021.07.25.453713v2]
[2]   METAVIRALSPADES: assembly of viruses from metagenomic data [J].
Antipov, Dmitry ;
Raiko, Mikhail ;
Lapidus, Alla ;
Pevzner, Pavel A. .
BIOINFORMATICS, 2020, 36 (14) :4126-4129
[3]   Seeker: alignment-free identification of bacteriophage genomes by deep learning [J].
Auslander, Noam ;
Gussow, Ayal B. ;
Benler, Sean ;
Wolf, Yuri, I ;
Koonin, Eugene, V .
NUCLEIC ACIDS RESEARCH, 2020, 48 (21) :E121
[4]   Virus Genomes from Deep Sea Sediments Expand the Ocean Megavirome and Support Independent Origins of Viral Gigantism [J].
Backstrom, Disa ;
Yutin, Natalya ;
Jorgensen, Steffen L. ;
Dharamshi, Jennah ;
Homa, Felix ;
Zaremba-Niedwiedzka, Katarzyna ;
Spang, Anja ;
Wolf, Yuri I. ;
Koonin, Eugene V. ;
Ettema, Thijs J. G. .
MBIO, 2019, 10 (02)
[5]   Prophages mediate defense against phage infection through diverse mechanisms [J].
Bondy-Denomy, Joseph ;
Qian, Jason ;
Westra, Edze R. ;
Buckling, Angus ;
Guttman, David S. ;
Davidson, Alan R. ;
Maxwell, Karen L. .
ISME JOURNAL, 2016, 10 (12) :2854-2866
[6]   Sensitive protein alignments at tree-of-life scale using DIAMOND [J].
Buchfink, Benjamin ;
Reuter, Klaus ;
Drost, Hajk-Georg .
NATURE METHODS, 2021, 18 (04) :366-+
[7]   Expansion of known ssRNA phage genomes: From tens to over a thousand [J].
Callanan, J. ;
Stockdale, S. R. ;
Shkoporov, A. ;
Draper, L. A. ;
Ross, R. P. ;
Hill, C. .
SCIENCE ADVANCES, 2020, 6 (06)
[8]  
Camargo A. P., 2020, PREPRINT, DOI [10.21203/rs.3.rs-51998/v1, DOI 10.21203/RS.3.RS-51998/V1]
[9]   RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences [J].
Camargo, Antonio P. ;
Sourkov, Vsevolod ;
Pereira, Goncalo A. G. ;
Carazzolle, Marcelo F. .
NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (01)
[10]   IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata [J].
Camargo, Antonio Pedro ;
Nayfach, Stephen ;
Chen, I-Min A. ;
Palaniappan, Krishnaveni ;
Ratner, Anna ;
Chu, Ken ;
Ritter, Stephan J. ;
Reddy, T. B. K. ;
Mukherjee, Supratim ;
Schulz, Frederik ;
Call, Lee ;
Neches, Russell Y. ;
Woyke, Tanja ;
Ivanova, Natalia N. ;
Eloe-Fadrosh, Emiley A. ;
Kyrpides, Nikos C. ;
Roux, Simon .
NUCLEIC ACIDS RESEARCH, 2023, 51 (D1) :D733-D743