Forecasting SARS-CoV-2 spike protein evolution from small data by deep learning and regression

被引:0
作者
King, Samuel [1 ,2 ,3 ]
Chen, Xinyi E. [1 ,4 ,5 ]
Ng, Sarah W. S. [1 ,4 ,5 ]
Rostin, Kimia [1 ,4 ,5 ]
Hahn, Samuel V. [1 ,6 ]
Roberts, Tylo [1 ,4 ]
Schwab, Janella C. [1 ,7 ]
Sekhon, Parneet [1 ,4 ]
Kagieva, Madina [1 ,2 ,3 ]
Reilly, Taylor [1 ,2 ,3 ]
Qi, Ruo Chen [1 ,8 ]
Salman, Paarsa [1 ,2 ,3 ]
Hong, Ryan J. [1 ,4 ]
Ma, Eric J. [9 ]
Hallam, Steven J. [1 ,4 ,9 ,10 ,11 ,12 ]
机构
[1] Univ British Columbia, BC Canc Agcy, Radiat Oncol, Vancouver, BC, Canada
[2] Univ British Columbia, Dept Bot, Vancouver, BC, Canada
[3] Univ British Columbia, Dept Zool, Vancouver, BC, Canada
[4] Univ British Columbia, Dept Microbiol & Immunol, Vancouver, BC, Canada
[5] Univ British Columbia, Dept Comp Sci, Vancouver, BC, Canada
[6] Univ British Columbia, Dept Chem & Biol Engn, Vancouver, BC, Canada
[7] Univ British Columbia, Fac Land & Food Syst, Vancouver, BC, Canada
[8] Univ British Columbia, Dept Cellular & Physiol Sci, Vancouver, BC, Canada
[9] Univ British Columbia, Grad Program Bioinformat, Vancouver, BC, Canada
[10] Univ British Columbia, Genome Sci & Technol Program, Vancouver, BC, Canada
[11] Univ British Columbia, Life Sci Inst, Vancouver, BC, Canada
[12] Univ British Columbia, ECOSCOPE Training Program, Vancouver, BC, Canada
来源
FRONTIERS IN SYSTEMS BIOLOGY | 2024年 / 4卷
基金
加拿大自然科学与工程研究理事会;
关键词
deep learning; regression; protein evolution; SARS-CoV-2; spike protein; small data; predictive model; GAUSSIAN PROCESS REGRESSION; VACCINE; MODEL;
D O I
10.3389/fsysb.2024.1284668
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The emergence of SARS-CoV-2 variants during the COVID-19 pandemic caused frequent global outbreaks that confounded public health efforts across many jurisdictions, highlighting the need for better understanding and prediction of viral evolution. Predictive models have been shown to support disease prevention efforts, such as with the seasonal influenza vaccine, but they require abundant data. For emerging viruses of concern, such models should ideally function with relatively sparse data typically encountered at the early stages of a viral outbreak. Conventional discrete approaches have proven difficult to develop due to the spurious and reversible nature of amino acid mutations and the overwhelming number of possible protein sequences adding computational complexity. We hypothesized that these challenges could be addressed by encoding discrete protein sequences into continuous numbers, effectively reducing the data size while enhancing the resolution of evolutionarily relevant differences. To this end, we developed a viral protein evolution prediction model (VPRE), which reduces amino acid sequences into continuous numbers by using an artificial neural network called a variational autoencoder (VAE) and models their most statistically likely evolutionary trajectories over time using Gaussian process (GP) regression. To demonstrate VPRE, we used a small amount of early SARS-CoV-2 spike protein sequences. We show that the VAE can be trained on a synthetic dataset based on this data. To recapitulate evolution along a phylogenetic path, we used only 104 spike protein sequences and trained the GP regression with the numerical variables to project evolution up to 5 months into the future. Our predictions contained novel variants and the most frequent prediction mapped primarily to a sequence that differed by only a single amino acid from the most reported spike protein within the prediction timeframe. Novel variants in the spike receptor binding domain (RBD) were capable of binding human angiotensin-converting enzyme 2 (ACE2) in silico, with comparable or better binding than previously resolved RBD-ACE2 complexes. Together, these results indicate the utility and tractability of combining deep learning and regression to model viral protein evolution with relatively sparse datasets, toward developing more effective medical interventions.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] An unconventional strategy for purifying recombinant SARS-CoV-2 spike protein
    Ingawale, Mrunal
    Riaz, Mohammad
    Durocher, Yves
    Ghosh, Raja
    JOURNAL OF CHROMATOGRAPHY B-ANALYTICAL TECHNOLOGIES IN THE BIOMEDICAL AND LIFE SCIENCES, 2024, 1247
  • [32] Computational and comparative investigation of hydrophobic profile of spike protein of SARS-CoV-2 and SARS-CoV
    Uma Shekhawat
    Anindita Roy Chowdhury (Chakravarty)
    Journal of Biological Physics, 2022, 48 : 399 - 414
  • [33] Direct Inhibition of SARS-CoV-2 Spike Protein by Peracetic Acid
    Yamamoto, Yuichiro
    Nakano, Yoshio
    Murae, Mana
    Shimizu, Yoshimi
    Sakai, Shota
    Ogawa, Motohiko
    Mizukami, Tomoharu
    Inoue, Tetsuya
    Onodera, Taishi
    Takahashi, Yoshimasa
    Wakita, Takaji
    Fukasawa, Masayoshi
    Miyazaki, Satoru
    Noguchi, Kohji
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2023, 24 (01)
  • [34] Stability and expression of SARS-CoV-2 spike-protein mutations
    Baek, Kristoffer T.
    Mehra, Rukmankesh
    Kepp, Kasper P.
    MOLECULAR AND CELLULAR BIOCHEMISTRY, 2023, 478 (06) : 1269 - 1280
  • [35] Potential antiviral peptides targeting the SARS-CoV-2 spike protein
    Khater, Ibrahim
    Nassar, Aaya
    BMC PHARMACOLOGY & TOXICOLOGY, 2022, 23 (01)
  • [36] Prospect of SARS-CoV-2 spike protein: Potential role in vaccine and therapeutic development
    Samrat, Subodh Kumar
    Tharappel, Anil M.
    Li, Zhong
    Li, Hongmin
    VIRUS RESEARCH, 2020, 288
  • [37] The Local Topological Free Energy of the SARS-CoV-2 Spike Protein
    Baldwin, Quenisha
    Sumpter, Bobby
    Panagiotou, Eleni
    POLYMERS, 2022, 14 (15)
  • [38] SARS-CoV-2 spike protein: pathogenesis, vaccines, and potential therapies
    Ahmed M. Almehdi
    Ghalia Khoder
    Aminah S. Alchakee
    Azizeh T. Alsayyid
    Nadin H. Sarg
    Sameh S. M. Soliman
    Infection, 2021, 49 : 855 - 876
  • [39] SARS-CoV-2 spike protein: pathogenesis, vaccines, and potential therapies
    Almehdi, Ahmed M.
    Khoder, Ghalia
    Alchakee, Aminah S.
    Alsayyid, Azizeh T.
    Sarg, Nadin H.
    Soliman, Sameh S. M.
    INFECTION, 2021, 49 (05) : 855 - 876
  • [40] Potential antiviral peptides targeting the SARS-CoV-2 spike protein
    Ibrahim Khater
    Aaya Nassar
    BMC Pharmacology and Toxicology, 23