Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic Samples

被引:0
作者
Song, Kai [1 ]
机构
[1] Qingdao Univ, Sch Math & Stat, Qingdao, Peoples R China
基金
中国国家自然科学基金;
关键词
metagenome; Markov chain; virus; assembly; contigs; VIRUSES; ALIGNMENT; VIROME;
D O I
10.3389/fmicb.2021.664560
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Metagenomes can be considered as mixtures of viral, bacterial, and other eukaryotic DNA sequences. Mining viral sequences from metagenomes could shed insight into virus-host relationships and expand viral databases. Current alignment-based methods are unsuitable for identifying viral sequences from metagenome sequences because most assembled metagenomic contigs are short and possess few or no predicted genes, and most metagenomic viral genes are dissimilar to known viral genes. In this study, I developed a Markov model-based method, VirMC, to identify viral sequences from metagenomic data. VirMC uses Markov chains to model sequence signatures and construct a scoring model using a likelihood test to distinguish viral and bacterial sequences. Compared with the other two state-of-the-art viral sequence-prediction methods, VirFinder and PPR-Meta, my proposed method outperformed VirFinder and had similar performance with PPR-Meta for short contigs with length less than 400 bp. VirMC outperformed VirFinder and PPR-Meta for identifying viral sequences in contaminated metagenomic samples with eukaryotic sequences. VirMC showed better performance in assembling viral-genome sequences from metagenomic data (based on filtering potential bacterial reads). Applying VirMC to human gut metagenomes from healthy subjects and patients with type-2 diabetes (T2D) revealed that viral contigs could help classify healthy and diseased statuses. This alignment-free method complements gene-based alignment approaches and will significantly improve the precision of viral sequence identification.
引用
收藏
页数:12
相关论文
共 54 条
  • [1] PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies
    Akhter, Sajia
    Aziz, Ramy K.
    Edwards, Robert A.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (16) : e126
  • [2] SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
    Bankevich, Anton
    Nurk, Sergey
    Antipov, Dmitry
    Gurevich, Alexey A.
    Dvorkin, Mikhail
    Kulikov, Alexander S.
    Lesin, Valery M.
    Nikolenko, Sergey I.
    Son Pham
    Prjibelski, Andrey D.
    Pyshkin, Alexey V.
    Sirotkin, Alexander V.
    Vyahhi, Nikolay
    Tesler, Glenn
    Alekseyev, Max A.
    Pevzner, Pavel A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) : 455 - 477
  • [3] Ray Meta: scalable de novo metagenome assembly and profiling
    Boisvert, Sebastien
    Raymond, Frederic
    Godzaridis, Elenie
    Laviolette, Francois
    Corbeil, Jacques
    [J]. GENOME BIOLOGY, 2012, 13 (12):
  • [4] Here a virus, there a virus, everywhere the same virus?
    Breitbart, M
    Rohwer, F
    [J]. TRENDS IN MICROBIOLOGY, 2005, 13 (06) : 278 - 284
  • [5] GUTSS: An Alignment-Free Sequence Comparison Method for Use in Human Intestinal Microbiome and Fecal Microbiota Transplantation Analysis
    Brittnacher, Mitchell J.
    Heltshe, Sonya L.
    Hayden, Hillary S.
    Radey, Matthew C.
    Weiss, Eli J.
    Damman, Christopher J.
    Zisman, Timothy L.
    Suskind, David L.
    Miller, Samuel I.
    [J]. PLOS ONE, 2016, 11 (07):
  • [6] Biogeography of Viruses in the Sea
    Chow, Cheryl-Emiliane T.
    Suttle, Curtis A.
    [J]. ANNUAL REVIEW OF VIROLOGY, VOL 2, 2015, 2 : 41 - 66
  • [7] Dabney A., 2010, QVALUE Q VALUE ESTIM, V1
  • [8] Diverse circular replication-associated protein encoding viruses circulating in invertebrates within a lake ecosystem
    Dayaram, Anisha
    Galatowitsch, Mark L.
    Argueello-Astorga, Gerardo R.
    van Bysterveldt, Katherine
    Kraberger, Simona
    Stainton, Daisy
    Harding, Jon S.
    Roumagnac, Philippe
    Martin, Darren P.
    Lefeuvre, Pierre
    Varsani, Arvind
    [J]. INFECTION GENETICS AND EVOLUTION, 2016, 39 : 304 - 316
  • [9] Diverse small circular DNA viruses circulating amongst estuarine molluscs
    Dayaram, Anisha
    Goldstien, Sharyn
    Argueello-Astorga, Gerardo R.
    Zawar-Reza, Peyman
    Gomez, Christopher
    Harding, Jon S.
    Varsani, Arvind
    [J]. INFECTION GENETICS AND EVOLUTION, 2015, 31 : 284 - 295
  • [10] Genome phylogeny based on short-range correlations in DNA sequences
    Dehnert, M
    Plaumann, R
    Helm, WE
    Hütt, MT
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2005, 12 (05) : 545 - 553