Impact of reference design on estimating SARS-CoV-2 lineage abundances from wastewater sequencing data

被引:1
作者
Assmann, Eva [1 ,2 ]
Agrawal, Shelesh [3 ]
Orschler, Laura [3 ]
Boettcher, Sindy [4 ]
Lackner, Susanne [3 ]
Hoelzer, Martin [1 ]
机构
[1] Robert Koch Inst, Genome Competence Ctr MF1, D-13353 Berlin, Germany
[2] Robert Koch Inst, Ctr Artificial Intelligence Publ Hlth Res ZKI PH, D-13353 Berlin, Germany
[3] Tech Univ Darmstadt, Inst IWAR, Chair Water & Environm Biotechnol, Dept Civil & Environm Engn Sci, D-62487 Darmstadt, Germany
[4] Robert Koch Inst, Gastroenteritis & Hepatitis Pathogens & Enteroviru, D-13353 Berlin, Germany
关键词
SARS-CoV-2; wastewater; sewage; abundance estimation; next-generation sequencing; benchmark; GENOMIC SURVEILLANCE;
D O I
10.1093/gigascience/giae051
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Sequencing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA from wastewater samples has emerged as a valuable tool for detecting the presence and relative abundances of SARS-CoV-2 variants in a community. By analyzing the viral genetic material present in wastewater, researchers and public health authorities can gain early insights into the spread of virus lineages and emerging mutations. Constructing reference datasets from known SARS-CoV-2 lineages and their mutation profiles has become state-of-the-art for assigning viral lineages and their relative abundances from wastewater sequencing data. However, selecting reference sequences or mutations directly affects the predictive power.Results Here, we show the impact of a mutation- and sequence-based reference reconstruction for SARS-CoV-2 abundance estimation. We benchmark 3 datasets: (i) synthetic "spike-in"' mixtures; (ii) German wastewater samples from early 2021, mainly comprising Alpha; and (iii) samples obtained from wastewater at an international airport in Germany from the end of 2021, including first signals of Omicron. The 2 approaches differ in sublineage detection, with the marker mutation-based method, in particular, being challenged by the increasing number of mutations and lineages. However, the estimations of both approaches depend on selecting representative references and optimized parameter settings. By performing parameter escalation experiments, we demonstrate the effects of reference size and alternative allele frequency cutoffs for abundance estimation. We show how different parameter settings can lead to different results for our test datasets and illustrate the effects of virus lineage composition of wastewater samples and references.Conclusions Our study highlights current computational challenges, focusing on the general reference design, which directly impacts abundance allocations. We illustrate advantages and disadvantages that may be relevant for further developments in the wastewater community and in the context of defining robust quality metrics.
引用
收藏
页数:16
相关论文
共 52 条
[11]  
baymlab, 2021, VLQ: Viral Lineage Quantification
[12]   Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016) [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (08) :888-888
[13]   A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 [J].
Cingolani, Pablo ;
Platts, Adrian ;
Wang, Le Lily ;
Coon, Melissa ;
Tung Nguyen ;
Wang, Luan ;
Land, Susan J. ;
Lu, Xiangyi ;
Ruden, Douglas M. .
FLY, 2012, 6 (02) :80-92
[14]  
cov-lineages, 2021, Pango Cov-Lineages website data
[15]   An integrated national scale SARS-CoV-2 genomic surveillance network [J].
Aanensen, David M. ;
Abudahab, Khalil ;
Adams, Alexander ;
Afifi, Safiah ;
Alam, Mohammed T. ;
Alderton, Alex ;
Alikhan, Nabil-Fareed ;
Allan, John ;
Almsaud, Mai ;
Alrezaihi, Abdulrahman ;
Alruwaili, Muhannad ;
Amato, Roberto ;
Andersson, Monique ;
Angyal, Adrienn ;
Aranday-Cortes, Elihu ;
Ariani, Cristina ;
Armstrong, Stuart D. ;
Asamaphan, Patawee ;
Attwood, Stephen ;
Aydin, Alp ;
Badhan, Anjna ;
Baker, David ;
Baker, Paul ;
Balcazar, Carlos E. ;
Ball, Jonathan ;
Barton, Anjna Edward ;
Bashton, Matthew ;
Baxter, Laura ;
Beale, Matthew ;
Beaver, Charlotte ;
Beckett, Angela ;
Beer, Rob ;
Beggs, Andrew ;
Bell, Andrew ;
Bellis, Katherine L. ;
Bentley, Eleanor G. ;
Berriman, Matt ;
Betteridge, Emma ;
Bibby, David ;
Bicknell, Kelly ;
Birchley, Alec ;
Black, Gary ;
Blane, Beth ;
Bloomfield, Samuel ;
Bolt, Frankie ;
Bonsall, David G. ;
Bosworth, Andrew ;
Bourgeois, Yann ;
Boyd, Olivia ;
Bradshaw, Daniel .
LANCET MICROBE, 2020, 1 (03) :E99-E100
[16]   Nextflow enables reproducible computational workflows [J].
Di Tommaso, Paolo ;
Chatzou, Maria ;
Floden, Evan W. ;
Prieto Barja, Pablo ;
Palumbo, Emilio ;
Notredame, Cedric .
NATURE BIOTECHNOLOGY, 2017, 35 (04) :316-319
[17]  
Ellmen I., 2024, Learning novel SARS-CoV-2 lineages from wastewater sequencing data, DOI [10.21203/rs.3.rs-4159693/v1, DOI 10.21203/RS.3.RS-4159693/V1]
[18]  
Ellmen I, 2021, PREPRINT, DOI [DOI 10.1101/2021.06.03.21258306, DOI 10.1101/2021.06.03.21258306V1, 10.1101/2021.06.03.21258306v1]
[19]   VirPool: model-based estimation of SARS-CoV-2 variant proportions in wastewater samples [J].
Gafurov, Askar ;
Balaz, Andrej ;
Amman, Fabian ;
Borsova, Kristina ;
Cabanova, Viktoria ;
Klempa, Boris ;
Bergthaler, Andreas ;
Vinar, Tomas ;
Brejova, Brona .
BMC BIOINFORMATICS, 2022, 23 (01)
[20]   Outbreak.info genomic reports: scalable and dynamic surveillance of SARS-CoV-2 variants and mutations [J].
Gangavarapu, Karthik ;
Latif, Alaa Abdel ;
Mullen, Julia L. ;
Alkuzweny, Manar ;
Hufbauer, Emory ;
Tsueng, Ginger ;
Haag, Emily ;
Zeller, Mark ;
Aceves, Christine M. ;
Zaiets, Karina ;
Cano, Marco ;
Zhou, Xinghua ;
Qian, Zhongchao ;
Sattler, Rachel ;
Matteson, Nathaniel L. ;
Levy, Joshua I. ;
Lee, Raphael T. C. ;
Freitas, Lucas ;
Maurer-Stroh, Sebastian ;
Suchard, Marc A. ;
Wu, Chunlei ;
Su, Andrew I. ;
Andersen, Kristian G. ;
Hughes, Laura D. .
NATURE METHODS, 2023, 20 (04) :512-522