Forecasting SARS-CoV-2 spike protein evolution from small data by deep learning and regression

被引:0
作者
King, Samuel [1 ,2 ,3 ]
Chen, Xinyi E. [1 ,4 ,5 ]
Ng, Sarah W. S. [1 ,4 ,5 ]
Rostin, Kimia [1 ,4 ,5 ]
Hahn, Samuel V. [1 ,6 ]
Roberts, Tylo [1 ,4 ]
Schwab, Janella C. [1 ,7 ]
Sekhon, Parneet [1 ,4 ]
Kagieva, Madina [1 ,2 ,3 ]
Reilly, Taylor [1 ,2 ,3 ]
Qi, Ruo Chen [1 ,8 ]
Salman, Paarsa [1 ,2 ,3 ]
Hong, Ryan J. [1 ,4 ]
Ma, Eric J. [9 ]
Hallam, Steven J. [1 ,4 ,9 ,10 ,11 ,12 ]
机构
[1] Univ British Columbia, BC Canc Agcy, Radiat Oncol, Vancouver, BC, Canada
[2] Univ British Columbia, Dept Bot, Vancouver, BC, Canada
[3] Univ British Columbia, Dept Zool, Vancouver, BC, Canada
[4] Univ British Columbia, Dept Microbiol & Immunol, Vancouver, BC, Canada
[5] Univ British Columbia, Dept Comp Sci, Vancouver, BC, Canada
[6] Univ British Columbia, Dept Chem & Biol Engn, Vancouver, BC, Canada
[7] Univ British Columbia, Fac Land & Food Syst, Vancouver, BC, Canada
[8] Univ British Columbia, Dept Cellular & Physiol Sci, Vancouver, BC, Canada
[9] Univ British Columbia, Grad Program Bioinformat, Vancouver, BC, Canada
[10] Univ British Columbia, Genome Sci & Technol Program, Vancouver, BC, Canada
[11] Univ British Columbia, Life Sci Inst, Vancouver, BC, Canada
[12] Univ British Columbia, ECOSCOPE Training Program, Vancouver, BC, Canada
来源
FRONTIERS IN SYSTEMS BIOLOGY | 2024年 / 4卷
基金
加拿大自然科学与工程研究理事会;
关键词
deep learning; regression; protein evolution; SARS-CoV-2; spike protein; small data; predictive model; GAUSSIAN PROCESS REGRESSION; VACCINE; MODEL;
D O I
10.3389/fsysb.2024.1284668
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The emergence of SARS-CoV-2 variants during the COVID-19 pandemic caused frequent global outbreaks that confounded public health efforts across many jurisdictions, highlighting the need for better understanding and prediction of viral evolution. Predictive models have been shown to support disease prevention efforts, such as with the seasonal influenza vaccine, but they require abundant data. For emerging viruses of concern, such models should ideally function with relatively sparse data typically encountered at the early stages of a viral outbreak. Conventional discrete approaches have proven difficult to develop due to the spurious and reversible nature of amino acid mutations and the overwhelming number of possible protein sequences adding computational complexity. We hypothesized that these challenges could be addressed by encoding discrete protein sequences into continuous numbers, effectively reducing the data size while enhancing the resolution of evolutionarily relevant differences. To this end, we developed a viral protein evolution prediction model (VPRE), which reduces amino acid sequences into continuous numbers by using an artificial neural network called a variational autoencoder (VAE) and models their most statistically likely evolutionary trajectories over time using Gaussian process (GP) regression. To demonstrate VPRE, we used a small amount of early SARS-CoV-2 spike protein sequences. We show that the VAE can be trained on a synthetic dataset based on this data. To recapitulate evolution along a phylogenetic path, we used only 104 spike protein sequences and trained the GP regression with the numerical variables to project evolution up to 5 months into the future. Our predictions contained novel variants and the most frequent prediction mapped primarily to a sequence that differed by only a single amino acid from the most reported spike protein within the prediction timeframe. Novel variants in the spike receptor binding domain (RBD) were capable of binding human angiotensin-converting enzyme 2 (ACE2) in silico, with comparable or better binding than previously resolved RBD-ACE2 complexes. Together, these results indicate the utility and tractability of combining deep learning and regression to model viral protein evolution with relatively sparse datasets, toward developing more effective medical interventions.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Molecular insights into the adaptive evolution of SARS-CoV-2 spike protein
    Yao, Zhuocheng
    Zhang, Lin
    Duan, Yuange
    Tang, Xiaolu
    Lu, Jian
    JOURNAL OF INFECTION, 2024, 88 (03)
  • [2] SARS-CoV-2 forecasting using regression and ARIMA
    Chaman Verma
    Purushottam Sharma
    Sanjay Singla
    Abhishek Srivastava
    Ruchi Sharma
    International Journal of System Assurance Engineering and Management, 2023, 14 : 2626 - 2641
  • [3] Flexible, Functional, and Familiar: Characteristics of SARS-CoV-2 Spike Protein Evolution
    Saputri, Dianita S.
    Li, Songling
    van Eerden, Floris J.
    Rozewicki, John
    Xu, Zichang
    Ismanto, Hendra S.
    Davila, Ana
    Teraguchi, Shunsuke
    Katoh, Kazutaka
    Standley, Daron M.
    FRONTIERS IN MICROBIOLOGY, 2020, 11
  • [4] SARS-CoV-2 forecasting using regression and ARIMA
    Verma, Chaman
    Sharma, Purushottam
    Singla, Sanjay
    Srivastava, Abhishek
    Sharma, Ruchi
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2023, 14 (06) : 2626 - 2641
  • [5] SARS-CoV-2 Spike Protein Interaction Space
    Lungu, Claudiu N.
    Putz, Mihai V.
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2023, 24 (15)
  • [6] Is the Stalk of the SARS-CoV-2 Spike Protein Druggable?
    Pipito, Ludovico
    Reynolds, Christopher A.
    Deganutti, Giuseppe
    VIRUSES-BASEL, 2022, 14 (12):
  • [7] Proteolytic activation of SARS-CoV-2 spike protein
    Takeda, Makoto
    MICROBIOLOGY AND IMMUNOLOGY, 2022, 66 (01) : 15 - 23
  • [8] Cellular signalling by SARS-CoV-2 spike protein
    Gracie, Nicholas P.
    Lai, Lachlan Y. S.
    Newsome, Timothy P.
    MICROBIOLOGY AUSTRALIA, 2024, 45 (01) : 13 - 17
  • [9] Calreticulin Regulates SARS-CoV-2 Spike Protein Turnover and Modulates SARS-CoV-2 Infectivity
    Rahimi, Nader
    White, Mitchell R.
    Amraei, Razie
    Lotfollahzadeh, Saran
    Xia, Chaoshuang
    Michalak, Marek
    Costello, Catherine E.
    Muhlberger, Elke
    CELLS, 2023, 12 (23)
  • [10] Rotavirus as an Expression Platform of Domains of the SARS-CoV-2 Spike Protein
    Philip, Asha Ann
    Patton, John Thomas
    VACCINES, 2021, 9 (05)