Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data

被引:9
|
作者
Roder, A. E. [1 ]
Johnson, K. E. E. [1 ,2 ]
Knoll, M. [2 ]
Khalfan, M. [2 ]
Wang, B. [2 ]
Schultz-Cherry, S. [3 ]
Banakis, S. [1 ]
Kreitman, A. [1 ]
Mederos, C. [1 ]
Youn, J. -H. [4 ]
Mercado, R. [4 ]
Wang, W. [1 ]
Chung, M. [1 ]
Ruchnewitz, D. [5 ]
Samanovic, M. I. [6 ]
Mulligan, M. J. [6 ]
Laessig, M. [5 ]
Luksza, M. [7 ]
Das, S. [4 ]
Gresham, D. [2 ]
Ghedin, E. [1 ,2 ]
机构
[1] NIAID, Syst Genom Sect, Lab Parasit Dis, DIR,NIH, Bethesda, MD 20892 USA
[2] NYU, Ctr Genom & Syst Biol, Dept Biol, New York, NY 10012 USA
[3] St Jude Childrens Res Hosp, Dept Infect Dis, Memphis, TN USA
[4] NIH, Dept Lab Med, Bethesda, MD USA
[5] Univ Cologne, Inst Biol Phys, Cologne, Germany
[6] NYU, Langone Vaccine Ctr, Dept Med, New York, NY USA
[7] Icahn Sch Med Mt Sinai, Dept Oncol Sci, New York, NY USA
来源
MBIO | 2023年 / 14卷 / 04期
关键词
SARS-CoV-2; influenza; genomics; bioinformatics; RNA; SELECTION; EVOLUTION; MUTATION; CANCER;
D O I
10.1128/mbio.01046-23
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in viral sequence data is complicated by errors introduced during sample preparation and data analysis. We used synthetic RNA controls and simulated data to test seven variant-calling tools across a range of allele frequencies and simulated coverages. We show that choice of variant caller and use of replicate sequencing have the most significant impact on single-nucleotide variant (SNV) discovery and demonstrate how both allele frequency and coverage thresholds impact both false discovery and false-negative rates. When replicates are not available, using a combination of multiple callers with more stringent cutoffs is recommended. We use these parameters to find minority variants in sequencing data from SARS-CoV-2 clinical specimens and provide guidance for studies of intra-host viral diversity using either single replicate data or data from technical replicates. Our study provides a framework for rigorous assessment of technical factors that impact SNV identification in viral samples and establishes heuristics that will inform and improve future studies of intra-host variation, viral diversity, and viral evolution. IMPORTANCEWhen viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution. When viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Intra-Host Diversity of SARS-Cov-2 Should Not Be Neglected: Case of the State of Victoria, Australia
    Armero, Alix
    Berthet, Nicolas
    Avarre, Jean-Christophe
    VIRUSES-BASEL, 2021, 13 (01):
  • [2] Intra-host evolution during SARS-CoV-2 prolonged infection
    Voloch, Carolina M.
    Francisco Jr, Ronaldo da Silva
    de Almeida, Luiz G. P.
    Brustolini, Otavio J.
    Cardoso, Cynthia C.
    Gerber, Alexandra L.
    Guimaraes, Ana Paula de C.
    Leitao, Isabela de Carvalho
    Mariani, Diana
    Ota, Victor Akira
    Lima, Cristiano X.
    Teixeira, Mauro M.
    Dias, Ana Carolina F.
    Galliez, Rafael Mello
    Faffe, Debora Souza
    Porto, Luis Cristovao
    Aguiar, Renato S.
    Castineira, Terezinha M. P. P.
    Ferreira, Orlando C.
    Tanuri, Amilcar
    de Vasconcelos, Ana Tereza R.
    VIRUS EVOLUTION, 2021, 7 (02)
  • [3] Two-step fitness selection for intra-host variations in SARS-CoV-2
    Li, Jiarui
    Du, Pengcheng
    Yang, Lijiang
    Zhang, Ju
    Song, Chuan
    Chen, Danying
    Song, Yangzi
    Ding, Nan
    Hua, Mingxi
    Han, Kai
    Song, Rui
    Xie, Wen
    Chen, Zhihai
    Wang, Xianbo
    Liu, Jingyuan
    Xu, Yanli
    Gao, Guiju
    Wang, Qi
    Pu, Lin
    Di, Lin
    Li, Jie
    Yue, Jinglin
    Han, Junyan
    Zhao, Xuesen
    Yan, Yonghong
    Yu, Fengting
    Wu, Angela R.
    Zhang, Fujie
    Gao, Yi Qin
    Huang, Yanyi
    Wang, Jianbin
    Zeng, Hui
    Chen, Chen
    CELL REPORTS, 2022, 38 (02):
  • [4] VERSO: A comprehensive framework for the inference of robust phylogenies and the quantification of intra-host genomic diversity of viral samples
    Ramazzotti, Daniele
    Angaroni, Fabrizio
    Maspero, Davide
    Gambacorti-Passerini, Carlo
    Antoniotti, Marco
    Graudenzi, Alex
    Piazza, Rocco
    PATTERNS, 2021, 2 (03):
  • [5] Spatio-temporal dynamics of intra-host variability in SARS-CoV-2 genomes
    Pathak, Ankit K.
    Mishra, Gyan Prakash
    Uppili, Bharathram
    Walia, Safal
    Fatihi, Saman
    Abbas, Tahseen
    Banu, Sofia
    Ghosh, Arup
    Kanampalliwar, Amol
    Jha, Atimukta
    Fatma, Sana
    Aggarwal, Shifu
    Dhar, Mahesh Shanker
    Marwal, Robin
    Radhakrishnan, Venkatraman Srinivasan
    Ponnusamy, Kalaiarasan
    Kabra, Sandhya
    Rakshit, Partha
    Bhoyar, Rahul C.
    Jain, Abhinav
    Divakar, Mohit Kumar
    Imran, Mohamed
    Faruq, Mohammed
    Sowpati, Divya Tej
    Thukral, Lipi
    Raghav, Sunil K.
    Mukerji, Mitali
    NUCLEIC ACIDS RESEARCH, 2022, 50 (03) : 1551 - 1561
  • [6] ACoRE: Accurate SARS-CoV-2 genome reconstruction for the characterization of intra-host and inter-host viral diversity in clinical samples and for the evaluation of re-infections
    Marcolungo, Luca
    Beltrami, Cristina
    Degli Esposti, Chiara
    Lopatriello, Giulia
    Piubelli, Chiara
    Mori, Antonio
    Pomari, Elena
    Deiana, Michela
    Scarso, Salvatore
    Bisoffi, Zeno
    Grosso, Valentina
    Cosentino, Emanuela
    Maestri, Simone
    Lavezzari, Denise
    Iadarola, Barbara
    Paterno, Marta
    Segala, Elena
    Giovannone, Barbara
    Gallinaro, Martina
    Rossato, Marzia
    Delledonne, Massimo
    GENOMICS, 2021, 113 (04) : 1628 - 1638
  • [7] Persistent SARS-CoV-2 Infection in a Patient With Non-hodgkin Lymphoma: Intra-Host Genomic Diversity Analysis
    Bianco, Angelica
    Capozzi, Loredana
    Del Sambro, Laura
    Simone, Domenico
    Pace, Lorenzo
    Rondinone, Valeria
    Difato, Laura M.
    Miccolupo, Angela
    Manzari, Caterina
    Fedele, Alberto
    Parisi, Antonio
    FRONTIERS IN VIROLOGY, 2022, 2
  • [8] Rapid SARS-CoV-2 Intra-Host and Within-Household Emergence of Novel Haplotypes
    Manuto, Laura
    Grazioli, Marco
    Spitaleri, Andrea
    Fontana, Paolo
    Bianco, Luca
    Bertolotti, Luigi
    Bado, Martina
    Mazzotti, Giorgia
    Bianca, Federico
    Onelia, Francesco
    Lorenzin, Giovanni
    Simeoni, Fabio
    Lazarevic, Dejan
    Franchin, Elisa
    Vecchio, Claudia Del
    Dorigatti, Ilaria
    Tonon, Giovanni
    Cirillo, Daniela Maria
    Lavezzo, Enrico
    Crisanti, Andrea
    Toppo, Stefano
    VIRUSES-BASEL, 2022, 14 (02):
  • [9] Investigating intra-host and intra-herd sequence diversity of foot-and-mouth disease virus
    King, David J.
    Freimanis, Graham L.
    Orton, Richard J.
    Waters, Ryan A.
    Haydon, Daniel T.
    King, Donald P.
    INFECTION GENETICS AND EVOLUTION, 2016, 44 : 286 - 292
  • [10] Population Bottlenecks and Intra-host Evolution During Human-to-Human Transmission of SARS-CoV-2
    Wang, Daxi
    Wang, Yanqun
    Sun, Wanying
    Zhang, Lu
    Ji, Jingkai
    Zhang, Zhaoyong
    Cheng, Xinyi
    Li, Yimin
    Xiao, Fei
    Zhu, Airu
    Zhong, Bei
    Ruan, Shicong
    Li, Jiandong
    Ren, Peidi
    Ou, Zhihua
    Xiao, Minfeng
    Li, Min
    Deng, Ziqing
    Zhong, Huanzi
    Li, Fuqiang
    Wang, Wen-jing
    Zhang, Yongwei
    Chen, Weijun
    Zhu, Shida
    Xu, Xun
    Jin, Xin
    Zhao, Jingxian
    Zhong, Nanshan
    Zhang, Wenwei
    Zhao, Jincun
    Li, Junhua
    Xu, Yonghao
    FRONTIERS IN MEDICINE, 2021, 8