Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data

被引:9
|
作者
Roder, A. E. [1 ]
Johnson, K. E. E. [1 ,2 ]
Knoll, M. [2 ]
Khalfan, M. [2 ]
Wang, B. [2 ]
Schultz-Cherry, S. [3 ]
Banakis, S. [1 ]
Kreitman, A. [1 ]
Mederos, C. [1 ]
Youn, J. -H. [4 ]
Mercado, R. [4 ]
Wang, W. [1 ]
Chung, M. [1 ]
Ruchnewitz, D. [5 ]
Samanovic, M. I. [6 ]
Mulligan, M. J. [6 ]
Laessig, M. [5 ]
Luksza, M. [7 ]
Das, S. [4 ]
Gresham, D. [2 ]
Ghedin, E. [1 ,2 ]
机构
[1] NIAID, Syst Genom Sect, Lab Parasit Dis, DIR,NIH, Bethesda, MD 20892 USA
[2] NYU, Ctr Genom & Syst Biol, Dept Biol, New York, NY 10012 USA
[3] St Jude Childrens Res Hosp, Dept Infect Dis, Memphis, TN USA
[4] NIH, Dept Lab Med, Bethesda, MD USA
[5] Univ Cologne, Inst Biol Phys, Cologne, Germany
[6] NYU, Langone Vaccine Ctr, Dept Med, New York, NY USA
[7] Icahn Sch Med Mt Sinai, Dept Oncol Sci, New York, NY USA
来源
MBIO | 2023年 / 14卷 / 04期
关键词
SARS-CoV-2; influenza; genomics; bioinformatics; RNA; SELECTION; EVOLUTION; MUTATION; CANCER;
D O I
10.1128/mbio.01046-23
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in viral sequence data is complicated by errors introduced during sample preparation and data analysis. We used synthetic RNA controls and simulated data to test seven variant-calling tools across a range of allele frequencies and simulated coverages. We show that choice of variant caller and use of replicate sequencing have the most significant impact on single-nucleotide variant (SNV) discovery and demonstrate how both allele frequency and coverage thresholds impact both false discovery and false-negative rates. When replicates are not available, using a combination of multiple callers with more stringent cutoffs is recommended. We use these parameters to find minority variants in sequencing data from SARS-CoV-2 clinical specimens and provide guidance for studies of intra-host viral diversity using either single replicate data or data from technical replicates. Our study provides a framework for rigorous assessment of technical factors that impact SNV identification in viral samples and establishes heuristics that will inform and improve future studies of intra-host variation, viral diversity, and viral evolution. IMPORTANCEWhen viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution. When viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Breathing Is Enough: For the Spread of Influenza Virus and SARS-CoV-2 by Breathing Only
    Scheuch, Gerhard
    JOURNAL OF AEROSOL MEDICINE AND PULMONARY DRUG DELIVERY, 2020, 33 (04) : 230 - 234
  • [32] A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3′ Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity
    Farkas, Carlos
    Mella, Andy
    Turgeon, Maxime
    Haigh, Jody J.
    FRONTIERS IN MICROBIOLOGY, 2021, 12
  • [33] A Single Vaccine Protects against SARS-CoV-2 and Influenza Virus in Mice
    Cao, Kangli
    Wang, Xiang
    Peng, Haoran
    Ding, Longfei
    Wang, Xiangwei
    Hu, Yangyang
    Dong, Lanlan
    Yang, Tianhan
    Hong, Xiujing
    Xing, Man
    Li, Duoduo
    Zhu, Cuisong
    He, Xiangchuan
    Zhao, Chen
    Zhao, Ping
    Zhou, Dongming
    Zhang, Xiaoyan
    Xu, Jianqing
    JOURNAL OF VIROLOGY, 2022, 96 (04)
  • [34] The battle between host and SARS-CoV-2: Innate immunity and viral evasion strategies
    Zhang, Shilei
    Wang, Lulan
    Cheng, Genhong
    MOLECULAR THERAPY, 2022, 30 (05) : 1869 - 1884
  • [35] SARS-CoV-2 outbreak: role of viral proteins and genomic diversity in virus infection and COVID-19 progression
    Hussein, Hosni A. M.
    Thabet, Ali A.
    Wardany, Ahmed A.
    El-Adly, Ahmed M.
    Ali, Mohamed
    Hassan, Mohamed E. A.
    Abdeldayem, Mohamed A. B.
    Mohamed, Abdul-Rahman M. A.
    Sobhy, Ali
    El-Mokhtar, Mohamed A.
    Afifi, Magdy M.
    Fathy, Samah M.
    Sultan, Serageldeen
    VIROLOGY JOURNAL, 2024, 21 (01)
  • [36] Detailed Molecular Interactions of Favipiravir with SARS-CoV-2, SARS-CoV, MERS-CoV, and Influenza Virus Polymerases In Silico
    Sada, Mitsuru
    Saraya, Takeshi
    Ishii, Haruyuki
    Okayama, Kaori
    Hayashi, Yuriko
    Tsugawa, Takeshi
    Nishina, Atsuyoshi
    Murakami, Koichi
    Kuroda, Makoto
    Ryo, Akihide
    Kimura, Hirokazu
    MICROORGANISMS, 2020, 8 (10) : 1 - 9
  • [37] SARS-CoV-2 outbreak: role of viral proteins and genomic diversity in virus infection and COVID-19 progression
    Hosni A. M. Hussein
    Ali A. Thabet
    Ahmed A. Wardany
    Ahmed M. El-Adly
    Mohamed Ali
    Mohamed E. A. Hassan
    Mohamed A. B. Abdeldayem
    Abdul-Rahman M. A. Mohamed
    Ali Sobhy
    Mohamed A. El-Mokhtar
    Magdy M. Afifi
    Samah M. Fathy
    Serageldeen Sultan
    Virology Journal, 21
  • [38] Environmental risk factors of airborne viral transmission: Humidity, Influenza and SARS-CoV-2 in the Netherlands
    Ravelli, Edsard
    Martinez, Rolando Gonzales
    SPATIAL AND SPATIO-TEMPORAL EPIDEMIOLOGY, 2022, 41
  • [39] Host-dependent C-to-U RNA editing in SARS-CoV-2 creates novel viral genes with optimized expressibility
    Zhang, Pirun
    Zhang, Wenli
    Li, Jiahuan
    Liu, Huiying
    Yu, Yantong
    Yang, Xiaoping
    Jiang, Wenqing
    FRONTIERS IN CELLULAR AND INFECTION MICROBIOLOGY, 2024, 14
  • [40] A hybrid PDE-ABM model for viral dynamics with application to SARS-CoV-2 and influenza
    Marzban, Sadegh
    Han, Renji
    Juhasz, Nora
    Rost, Gergely
    ROYAL SOCIETY OPEN SCIENCE, 2021, 8 (11):