VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families

被引:76
作者
Pons, Joan Carles [1 ]
Paez-Espino, David [2 ]
Riera, Gabriel [1 ]
Ivanova, Natalia [2 ]
Kyrpides, Nikos C. [2 ]
Llabres, Merce [1 ]
机构
[1] Univ Balearic Isl, Dept Math & Comp Sci, Palma De Mallorca 07122, Spain
[2] Dept Energy Joint Genome Inst, Berkeley, CA 94720 USA
关键词
D O I
10.1093/bioinformatics/btab026
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. Results: In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at the genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified.
引用
收藏
页码:1805 / 1813
页数:9
相关论文
共 25 条
[1]   Alignment-free d2* oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences [J].
Ahlgren, Nathan A. ;
Ren, Jie ;
Lu, Yang Young ;
Fuhrman, Jed A. ;
Sun, Fengzhu .
NUCLEIC ACIDS RESEARCH, 2017, 45 (01) :39-53
[2]   Evaluation of the genomic diversity of viruses infecting bacteria, archaea and eukaryotes using a common bioinformatic platform: steps towards a unified taxonomy [J].
Aiewsakun, Pakorn ;
Adriaenssens, Evelien M. ;
Lavigne, Rob ;
Kropinski, Andrew M. ;
Simmonds, Peter .
JOURNAL OF GENERAL VIROLOGY, 2018, 99 (09) :1331-1343
[3]   EXPRESSION OF ANIMAL VIRUS GENOMES [J].
BALTIMORE, D .
BACTERIOLOGICAL REVIEWS, 1971, 35 (03) :235-+
[4]   vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria [J].
Bolduc, Benjamin ;
Jang, Ho Bin ;
Doulcier, Guilhem ;
You, Zhi-Qiang ;
Roux, Simon ;
Sullivan, Matthew B. .
PEERJ, 2017, 5
[5]   IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes [J].
Chen, I-Min A. ;
Chu, Ken ;
Palaniappan, Krishna ;
Pillay, Manoj ;
Ratner, Anna ;
Huang, Jinghua ;
Huntemann, Marcel ;
Varghese, Neha ;
White, James R. ;
Seshadri, Rekha ;
Smirnova, Tatyana ;
Kirton, Edward ;
Jungbluth, Sean P. ;
Woyke, Tanja ;
Eloe-Fadrosh, Emiley A. ;
Ivanova, Natalia N. ;
Kyrpides, Nikos C. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D666-D677
[6]   Viral taxonomy derived from evolutionary genome relationships [J].
Dougan, Tyler J. ;
Quake, Stephen R. .
PLOS ONE, 2019, 14 (08)
[7]   Host Taxon Predictor - A Tool for Predicting Taxon of the Host of a Newly Discovered Virus [J].
Galan, Wojciech ;
Bak, Maciej ;
Jakubowska, Malgorzata .
SCIENTIFIC REPORTS, 2019, 9 (1)
[8]   WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs [J].
Galiez, Clovis ;
Siebert, Matthias ;
Enault, Francois ;
Vincent, Jonathan ;
Soeding, Johannes .
BIOINFORMATICS, 2017, 33 (19) :3113-3114
[9]   ViralZone: a knowledge resource to understand virus diversity [J].
Hulo, Chantal ;
de Castro, Edouard ;
Masson, Patrick ;
Bougueleret, Lydie ;
Bairoch, Amos ;
Xenarios, Ioannis ;
Le Mercier, Philippe .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D576-D582
[10]   Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks [J].
Jang, Ho Bin ;
Bolduc, Benjamin ;
Zablocki, Olivier ;
Kuhn, Jens H. ;
Roux, Simon ;
Adriaenssens, Evelien M. ;
Brister, J. Rodney ;
Kropinski, Andrew M. ;
Krupovic, Mart ;
Lavigne, Rob ;
Turner, Dann ;
Sullivan, Matthew B. .
NATURE BIOTECHNOLOGY, 2019, 37 (06) :632-+