Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations

被引:0
作者
Elkin, Magdalyn E. [1 ]
Zhu, Xingquan [1 ]
机构
[1] Florida Atlantic Univ, Dept Elect Engn & Comp Sci, 777 Glades Rd, Boca Raton, FL 33431 USA
基金
美国国家科学基金会;
关键词
LANGUAGE; EVOLUTION;
D O I
10.1038/s42003-024-07262-7
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Predicting novel mutations has long-lasting impacts on life science research. Traditionally, this problem is addressed through wet-lab experiments, which are often expensive and time consuming. The recent advancement in neural language models has provided stunning results in modeling and deciphering sequences. In this paper, we propose a Deep Novel Mutation Search (DNMS) method, using deep neural networks, to model protein sequence for mutation prediction. We use SARS-CoV-2 spike protein as the target and use a protein language model to predict novel mutations. Different from existing research which is often limited to mutating the reference sequence for prediction, we propose a parent-child mutation prediction paradigm where a parent sequence is modeled for mutation prediction. Because mutations introduce changing context to the underlying sequence, DNMS models three aspects of the protein sequences: semantic changes, grammatical changes, and attention changes, each modeling protein sequence aspects from shifting of semantics, grammar coherence, and amino-acid interactions in latent space. A ranking approach is proposed to combine all three aspects to capture mutations demonstrating evolving traits, in accordance with real-world SARS-CoV-2 spike protein sequence evolution. DNMS can be adopted for an early warning variant detection system, creating public health awareness of future SARS-CoV-2 mutations.
引用
收藏
页数:16
相关论文
共 64 条
  • [1] Aksamentov I., 2021, J OPEN SOURCE SOFTW, V6, P3773, DOI DOI 10.21105/JOSS.03773
  • [2] UniProt: a worldwide hub of protein knowledge
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Alpi, Emanuele
    Bely, Benoit
    Bingley, Mark
    Britto, Ramona
    Bursteinas, Borisas
    Busiello, Gianluca
    Bye-A-Jee, Hema
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Daniel
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Ignatchenko, Alexandr
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lopez, Rodrigo
    Luo, Jie
    Lussi, Yvonne
    MacDougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Nightingale, Andrew
    Onwubiko, Joseph
    Palka, Barbara
    Pichler, Klemens
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Renaux, Alexandre
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Speretta, Elena
    Turner, Edward
    Tyagi, Nidhi
    Vasudev, Preethi
    Volynkin, Vladimir
    Wardell, Tony
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D506 - D515
  • [3] Early computational detection of potential high-risk SARS-CoV-2 variants
    Beguir, Karim
    Skwark, Marcin J.
    Fu, Yunguan
    Pierrot, Thomas
    Carranza, Nicolas Lopez
    Laterre, Alexandre
    Kadri, Ibtissem
    Korched, Abir
    Lowegard, Anna U.
    Lui, Bonny Gaby
    Saenger, Bianca
    Liu, Yunpeng
    Poran, Asaf
    Muik, Alexander
    Sahin, Ugur
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 155
  • [4] Learning the protein language: Evolution, structure, and function
    Bepler, Tristan
    Berger, Bonnie
    [J]. CELL SYSTEMS, 2021, 12 (06) : 654 - +
  • [5] Imprinted SARS-CoV-2 humoral immunity induces convergent Omicron RBD evolution
    Cao, Yunlong
    Jian, Fanchong
    Wang, Jing
    Yu, Yuanling
    Song, Weiliang
    Yisimayi, Ayijiang
    Wang, Jing
    An, Ran
    Chen, Xiaosu
    Zhang, Na
    Wang, Yao
    Wang, Peng
    Zhao, Lijuan
    Sun, Haiyan
    Yu, Lingling
    Yang, Sijie
    Niu, Xiao
    Xiao, Tianhe
    Gu, Qingqing
    Shao, Fei
    Hao, Xiaohua
    Xu, Yanli
    Jin, Ronghua
    Shen, Zhongyang
    Wang, Youchun
    Xie, Xiaoliang Sunney
    [J]. NATURE, 2023, 614 (7948) : 521 - +
  • [6] Darooneh Amir Hossein, 2022, QRB Discov, V3, pe1, DOI 10.1017/qrd.2021.13
  • [7] Variant-driven early warning via unsupervised machine learning analysis of spike protein mutations for COVID-19
    de Hoffer, Adele
    Vatani, Shahram
    Cot, Corentin
    Cacciapaglia, Giacomo
    Chiusano, Maria Luisa
    Cimarelli, Andrea
    Conventi, Francesco
    Giannini, Antonio
    Hohenegger, Stefan
    Sannino, Francesco
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [8] Demsar J, 2006, J MACH LEARN RES, V7, P1
  • [9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [10] Identification and computational analysis of mutations in SARS-CoV-2
    Dey, Tathagata
    Chatterjee, Shreyans
    Manna, Smarajit
    Nandy, Ashesh
    Basak, Subhas C.
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 129