EvoRator2: Predicting Site-specific Amino Acid Substitutions Based on Protein Structural Information Using Deep Learning

被引:2
作者
Nagar, Natan [1 ]
Tubiana, Jerome [2 ]
Loewenthal, Gil [1 ]
Wolfson, Haim J. [2 ]
Ben Tal, Nir [3 ]
Pupko, Tal [1 ]
机构
[1] Tel Aviv Univ, George S Wise Fac Life Sci, Shmunis Sch Biomed & Canc Res, Tel Aviv 69978, Israel
[2] Tel Aviv Univ, Raymond & Beverly Sackler Fac Exact Sci, Blavatnik Sch Comp Sci, IL-69978 Tel Aviv, Israel
[3] Tel Aviv Univ, George S Wise Fac Life Sci, Sch Neurobiol Biochem & Biophys, IL-69978 Tel Aviv, Israel
基金
以色列科学基金会;
关键词
protein evolution; protein structure; protein function; mutation; deep learning; EVOLUTIONARY CONSERVATION; STRUCTURE DATABASE; WEB SERVER; CONSURF; SEQUENCE; MUTATIONS; CATH;
D O I
10.1016/j.jmb.2023.168155
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignments (MSAs) are the workhorse of molecular evolution and structural biology research. From MSAs, the amino acids that are tolerated at each site during protein evolution can be inferred. However, little is known regarding the repertoire of tolerated amino acids in proteins when only a few or no sequence homologs are available, such as orphan and de novo designed proteins. Here we present EvoRator2, a deep-learning algorithm trained on over 15,000 protein structures that can predict which amino acids are tolerated at any given site, based exclusively on protein structural information mined from atomic coordinate files. We show that EvoRator2 obtained satisfying results for the prediction of position-weighted scoring matrices (PSSM). We further show that EvoRator2 obtained near state-of-the-art performance on proteins with high quality structures in predicting the effect of mutations in deep mutation scanning (DMS) experiments and that for certain DMS targets, EvoRator2 outperformed state-of-the-art methods. We also show that by combining EvoRator20s predictions with those obtained by a state-of-the-art deep-learning method that accounts for the information in the MSA, the prediction of the effect of mutation in DMS experiments was improved in terms of both accuracy and stability. EvoRator2 is designed to predict which amino-acid substitutions are tolerated in such proteins without many homologous sequences, including orphan or de novo designed proteins. We implemented our approach in the EvoRator web server (https://evorator.tau.ac.il).(c) 2023 Published by Elsevier Ltd.
引用
收藏
页数:8
相关论文
共 59 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Adzhubei Ivan, 2013, Curr Protoc Hum Genet, VChapter 7, DOI 10.1002/0471142905.hg0720s76
  • [3] [Anonymous], 2016, Package "rms"
  • [4] ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules
    Ashkenazy, Haim
    Abadi, Shiran
    Martz, Eric
    Chay, Ofer
    Mayrose, Itay
    Pupko, Tal
    Ben-Tal, Nir
    [J]. NUCLEIC ACIDS RESEARCH, 2016, 44 (W1) : W344 - W350
  • [5] ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids
    Ashkenazy, Haim
    Erez, Elana
    Martz, Eric
    Pupko, Tal
    Ben-Tal, Nir
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : W529 - W533
  • [6] Evolution and functional cross-talk of protein post-translational modifications
    Beltrao, Pedro
    Bork, Peer
    Krogan, Nevan J.
    van Noort, Vera
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2013, 9
  • [7] ConSurf-DB: An accessible repository for the evolutionary conservation patterns of the majority of PDB proteins
    Ben Chorin, Adi
    Masrati, Gal
    Kessel, Amit
    Narunsky, Aya
    Sprinzak, Josef
    Lahav, Shlomtzion
    Ashkenazy, Haim
    Ben-Tal, Nir
    [J]. PROTEIN SCIENCE, 2020, 29 (01) : 258 - 267
  • [8] Quantifying and understanding the fitness effects of protein mutations: Laboratory versus nature
    Boucher, Jeffrey I.
    Bolon, Daniel N. A.
    Tawfik, Dan S.
    [J]. PROTEIN SCIENCE, 2016, 25 (07) : 1219 - 1226
  • [9] ConSurf: Using Evolutionary Data to Raise Testable Hypotheses about Protein Function
    Celniker, Gershon
    Nimrod, Guy
    Ashkenazy, Haim
    Glaser, Fabian
    Martz, Eric
    Mayrose, Itay
    Pupko, Tal
    Ben-Tal, Nir
    [J]. ISRAEL JOURNAL OF CHEMISTRY, 2013, 53 (3-4) : 199 - 206
  • [10] Positive selection detection in 40,000 human immunodeficiency virus (HIV) type 1 sequences automatically identifies drug resistance and positive fitness mutations in HIV protease and reverse transcriptase
    Chen, LM
    Perlina, A
    Lee, CJ
    [J]. JOURNAL OF VIROLOGY, 2004, 78 (07) : 3722 - 3732