Machine Learning Early Detection of SARS-CoV-2 High-Risk Variants

被引:1
作者
Li, Lun [1 ,2 ]
Li, Cuiping [1 ,2 ]
Li, Na [1 ,2 ]
Zou, Dong [1 ,2 ]
Zhao, Wenming [1 ,2 ,3 ,4 ]
Luo, Hong [1 ,2 ]
Xue, Yongbiao [1 ,2 ,4 ]
Zhang, Zhang [1 ,2 ,3 ,4 ]
Bao, Yiming [1 ,2 ,3 ,4 ]
Song, Shuhui [1 ,2 ,3 ,4 ]
机构
[1] China Natl Ctr Bioinformat, Beijing 100101, Peoples R China
[2] Chinese Acad Sci, Beijing Inst Genom, Natl Genom Data Ctr, Beijing 100101, Peoples R China
[3] Chinese Acad Sci, Beijing Inst Genom, CAS Key Lab Genome Sci & Informat, Beijing 100101, Peoples R China
[4] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
haplotype network; high-risk variant; machine learning; pre-warning; SARS-CoV-2; HAPLOTYPES;
D O I
10.1002/advs.202405058
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has evolved many high-risk variants, resulting in repeated COVID-19 waves over the past years. Therefore, accurate early warning of high-risk variants is vital for epidemic prevention and control. However, detecting high-risk variants through experimental and epidemiological research is time-consuming and often lags behind the emergence and spread of these variants. In this study, HiRisk-Detector a machine learning algorithm based on haplotype network, is developed for computationally early detecting high-risk SARS-CoV-2 variants. Leveraging over 7.6 million high-quality and complete SARS-CoV-2 genomes and metadata, the effectiveness, robustness, and generalizability of HiRisk-Detector are validated. First, HiRisk-Detector is evaluated on actual empirical data, successfully detecting all 13 high-risk variants, preceding World Health Organization announcements by 27 days on average. Second, its robustness is tested by reducing sequencing intensity to one-fourth, noting only a minimal delay of 3.8 days, demonstrating its effectiveness. Third, HiRisk-Detector is applied to detect risks among SARS-CoV-2 Omicron variant sub-lineages, confirming its broad applicability and high ROC-AUC and PR-AUC performance. Overall, HiRisk-Detector features powerful capacity for early detection of high-risk variants, bearing great utility for any public emergency caused by infectious diseases or viruses. This study first validates a correlation between haplotype network features and the risk levels of SARS-CoV-2 variants. Building on this, HiRisk-Detector, a machine learning algorithm, is developed for the early detection of high-risk variants. The effectiveness, robustness, and generalizability of HiRisk-Detector are confirmed using over 7.6 million SARS-CoV-2 genomes. image
引用
收藏
页数:12
相关论文
共 33 条
  • [1] Tracking the spread of COVID-19 in India via social networks in the early phase of the pandemic
    Azad, Smite
    Devi, Sushma
    [J]. JOURNAL OF TRAVEL MEDICINE, 2020, 27 (08)
  • [2] Median-joining networks for inferring intraspecific phylogenies
    Bandelt, HJ
    Forster, P
    Röhl, A
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (01) : 37 - 48
  • [3] Early computational detection of potential high-risk SARS-CoV-2 variants
    Beguir, Karim
    Skwark, Marcin J.
    Fu, Yunguan
    Pierrot, Thomas
    Carranza, Nicolas Lopez
    Laterre, Alexandre
    Kadri, Ibtissem
    Korched, Abir
    Lowegard, Anna U.
    Lui, Bonny Gaby
    Saenger, Bianca
    Liu, Yunpeng
    Poran, Asaf
    Muik, Alexander
    Sahin, Ugur
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 155
  • [4] COX DR, 1958, J R STAT SOC B, V20, P215
  • [5] Defining the risk of SARS-CoV-2 variants on immune protection
    DeGrace, Marciela M.
    Ghedin, Elodie
    Frieman, Matthew B.
    Krammer, Florian
    Grifoni, Alba
    Alisoltani, Arghavan
    Alter, Galit
    Amara, Rama R.
    Baric, Ralph S.
    Barouch, Dan H.
    Bloom, Jesse D.
    Bloyet, Louis-Marie
    Bonenfant, Gaston
    Boon, Adrianus C. M.
    Boritz, Eli A.
    Bratt, Debbie L.
    Bricker, Traci L.
    Brown, Liliana
    Buchser, William J.
    Carreno, Juan Manuel
    Cohen-Lavi, Liel
    Darling, Tamarand L.
    Davis-Gardner, Meredith E.
    Dearlove, Bethany L.
    Di, Han
    Dittmann, Meike
    Doria-Rose, Nicole A.
    Douek, Daniel C.
    Drosten, Christian
    Edara, Venkata-Viswanadh
    Ellebedy, Ali
    Fabrizio, Thomas P.
    Ferrari, Guido
    Fischer, Will M.
    Florence, William C.
    Fouchier, Ron A. M.
    Franks, John
    Garcia-Sastre, Adolfo
    Godzik, Adam
    Gonzalez-Reiche, Ana Silvia
    Gordon, Aubree
    Haagmans, Bart L.
    Halfmann, Peter J.
    Ho, David D.
    Holbrook, Michael R.
    Huang, Yaoxing
    James, Sarah L.
    Jaroszewski, Lukasz
    Jeevan, Trushar
    Johnson, Robert M.
    [J]. NATURE, 2022, 605 (7911) : 640 - 652
  • [6] Updated rapid risk assessment from ECDC on the risk related to the spread of new SARS-CoV-2 variants of concern in the EU/EEA - first update
    Eurosurveillance editorial team
    [J]. EUROSURVEILLANCE, 2021, 26 (03)
  • [7] Phylogenetic network analysis of SARS-CoV-2 genomes
    Forster, Peter
    Forster, Lucy
    Renfrew, Colin
    Forster, Michael
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (17) : 9241 - 9243
  • [8] Gabor C., 2006, InterJournal Complex Systems, DOI DOI 10.3724/SP.J.1087.2009.02191
  • [9] Haixians G, 2017, EXPERT SYST APPL, V73, P770, DOI DOI 10.1016/J.ESWA.2016.12.035
  • [10] Using big sequencing data to identify chronic SARS-Coronavirus-2 infections
    Harari, Sheri
    Miller, Danielle
    Fleishon, Shay
    Burstein, David
    Stern, Adi
    [J]. NATURE COMMUNICATIONS, 2024, 15 (01)