Forecasting SARS-CoV-2 spike protein evolution from small data by deep learning and regression

被引:0
作者
King, Samuel [1 ,2 ,3 ]
Chen, Xinyi E. [1 ,4 ,5 ]
Ng, Sarah W. S. [1 ,4 ,5 ]
Rostin, Kimia [1 ,4 ,5 ]
Hahn, Samuel V. [1 ,6 ]
Roberts, Tylo [1 ,4 ]
Schwab, Janella C. [1 ,7 ]
Sekhon, Parneet [1 ,4 ]
Kagieva, Madina [1 ,2 ,3 ]
Reilly, Taylor [1 ,2 ,3 ]
Qi, Ruo Chen [1 ,8 ]
Salman, Paarsa [1 ,2 ,3 ]
Hong, Ryan J. [1 ,4 ]
Ma, Eric J. [9 ]
Hallam, Steven J. [1 ,4 ,9 ,10 ,11 ,12 ]
机构
[1] Univ British Columbia, BC Canc Agcy, Radiat Oncol, Vancouver, BC, Canada
[2] Univ British Columbia, Dept Bot, Vancouver, BC, Canada
[3] Univ British Columbia, Dept Zool, Vancouver, BC, Canada
[4] Univ British Columbia, Dept Microbiol & Immunol, Vancouver, BC, Canada
[5] Univ British Columbia, Dept Comp Sci, Vancouver, BC, Canada
[6] Univ British Columbia, Dept Chem & Biol Engn, Vancouver, BC, Canada
[7] Univ British Columbia, Fac Land & Food Syst, Vancouver, BC, Canada
[8] Univ British Columbia, Dept Cellular & Physiol Sci, Vancouver, BC, Canada
[9] Univ British Columbia, Grad Program Bioinformat, Vancouver, BC, Canada
[10] Univ British Columbia, Genome Sci & Technol Program, Vancouver, BC, Canada
[11] Univ British Columbia, Life Sci Inst, Vancouver, BC, Canada
[12] Univ British Columbia, ECOSCOPE Training Program, Vancouver, BC, Canada
来源
FRONTIERS IN SYSTEMS BIOLOGY | 2024年 / 4卷
基金
加拿大自然科学与工程研究理事会;
关键词
deep learning; regression; protein evolution; SARS-CoV-2; spike protein; small data; predictive model; GAUSSIAN PROCESS REGRESSION; VACCINE; MODEL;
D O I
10.3389/fsysb.2024.1284668
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The emergence of SARS-CoV-2 variants during the COVID-19 pandemic caused frequent global outbreaks that confounded public health efforts across many jurisdictions, highlighting the need for better understanding and prediction of viral evolution. Predictive models have been shown to support disease prevention efforts, such as with the seasonal influenza vaccine, but they require abundant data. For emerging viruses of concern, such models should ideally function with relatively sparse data typically encountered at the early stages of a viral outbreak. Conventional discrete approaches have proven difficult to develop due to the spurious and reversible nature of amino acid mutations and the overwhelming number of possible protein sequences adding computational complexity. We hypothesized that these challenges could be addressed by encoding discrete protein sequences into continuous numbers, effectively reducing the data size while enhancing the resolution of evolutionarily relevant differences. To this end, we developed a viral protein evolution prediction model (VPRE), which reduces amino acid sequences into continuous numbers by using an artificial neural network called a variational autoencoder (VAE) and models their most statistically likely evolutionary trajectories over time using Gaussian process (GP) regression. To demonstrate VPRE, we used a small amount of early SARS-CoV-2 spike protein sequences. We show that the VAE can be trained on a synthetic dataset based on this data. To recapitulate evolution along a phylogenetic path, we used only 104 spike protein sequences and trained the GP regression with the numerical variables to project evolution up to 5 months into the future. Our predictions contained novel variants and the most frequent prediction mapped primarily to a sequence that differed by only a single amino acid from the most reported spike protein within the prediction timeframe. Novel variants in the spike receptor binding domain (RBD) were capable of binding human angiotensin-converting enzyme 2 (ACE2) in silico, with comparable or better binding than previously resolved RBD-ACE2 complexes. Together, these results indicate the utility and tractability of combining deep learning and regression to model viral protein evolution with relatively sparse datasets, toward developing more effective medical interventions.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Vaccine design based on 16 epitopes of SARS-CoV-2 spike protein
    He, Jinlei
    Huang, Fan
    Zhang, Jianhui
    Chen, Qiwei
    Zheng, Zhiwan
    Zhou, Qi
    Chen, Dali
    Li, Jiao
    Chen, Jianping
    JOURNAL OF MEDICAL VIROLOGY, 2021, 93 (04) : 2115 - 2131
  • [22] The effects of amino acid substitution of spike protein and genomic recombination on the evolution of SARS-CoV-2
    Fang, Letian
    Xu, Jie
    Zhao, Yue
    Fan, Junyan
    Shen, Jiaying
    Liu, Wenbin
    Cao, Guangwen
    FRONTIERS IN MICROBIOLOGY, 2023, 14
  • [23] Mutation profile of SARS-CoV-2 spike protein and identification of potential multiple epitopes within spike protein for vaccine development against SARS-CoV-2
    Paul D.
    Pyne N.
    Paul S.
    VirusDisease, 2021, 32 (4) : 703 - 726
  • [24] Discovery of potential small molecular SARS-CoV-2 entry blockers targeting the spike protein
    Wang, Lin
    Wu, Yan
    Yao, Sheng
    Ge, Huan
    Zhu, Ya
    Chen, Kun
    Chen, Wen-zhang
    Zhang, Yi
    Zhu, Wei
    Wang, Hong-yang
    Guo, Yu
    Ma, Pei-xiang
    Ren, Peng-xuan
    Zhang, Xiang-lei
    Li, Hui-qiong
    Ali, Mohammad A.
    Xu, Wen-qing
    Jiang, Hua-liang
    Zhang, Lei-ke
    Zhu, Li-li
    Ye, Yang
    Shang, Wei-juan
    Bai, Fang
    ACTA PHARMACOLOGICA SINICA, 2022, 43 (04) : 788 - 796
  • [25] SAS: A Platform of Spike Antigenicity for SARS-CoV-2
    Zhang, Lu
    Cao, Ruifang
    Mao, Tiantian
    Wang, Yuan
    Lv, Daqing
    Yang, Liangfu
    Tang, Yuanyuan
    Zhou, Mengdi
    Ling, Yunchao
    Zhang, Guoqing
    Qiu, Tianyi
    Cao, Zhiwei
    FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY, 2021, 9
  • [26] Discovery of potential small molecular SARS-CoV-2 entry blockers targeting the spike protein
    Lin Wang
    Yan Wu
    Sheng Yao
    Huan Ge
    Ya Zhu
    Kun Chen
    Wen-zhang Chen
    Yi Zhang
    Wei Zhu
    Hong-yang Wang
    Yu Guo
    Pei-xiang Ma
    Peng-xuan Ren
    Xiang-lei Zhang
    Hui-qiong Li
    Mohammad A. Ali
    Wen-qing Xu
    Hua-liang Jiang
    Lei-ke Zhang
    Li-li Zhu
    Yang Ye
    Wei-juan Shang
    Fang Bai
    Acta Pharmacologica Sinica, 2022, 43 : 788 - 796
  • [27] Crystallographic and biophysical analysis of the fusion core from SARS-CoV-2 spike protein
    Hsu, Chun-Hua
    JOURNAL OF THE CHINESE CHEMICAL SOCIETY, 2023, 70 (05) : 1208 - 1218
  • [28] Enveloped Viral Replica Equipped with Spike Protein Derived from SARS-CoV-2
    Furukawa, Hiroto
    Nakamura, Sosuke
    Mizuta, Ryosuke
    Sakamoto, Kentarou
    Inaba, Hiroshi
    Sawada, Shin-ichi
    Sasaki, Yoshihiro
    Akiyoshi, Kazunari
    Matsuura, Kazunori
    ACS SYNTHETIC BIOLOGY, 2024, 13 (07): : 2029 - 2037
  • [29] SARS-CoV-2 spike protein: pathogenesis, vaccines, and potential therapies
    Ahmed M. Almehdi
    Ghalia Khoder
    Aminah S. Alchakee
    Azizeh T. Alsayyid
    Nadin H. Sarg
    Sameh S. M. Soliman
    Infection, 2021, 49 : 855 - 876
  • [30] The Local Topological Free Energy of the SARS-CoV-2 Spike Protein
    Baldwin, Quenisha
    Sumpter, Bobby
    Panagiotou, Eleni
    POLYMERS, 2022, 14 (15)