Predicting mutant outcome by combining deep mutational scanning and machine learning

被引:10
作者
Sarfati, Hagit [1 ]
Naftaly, Si [2 ,3 ]
Papo, Niv [2 ,3 ]
Keasar, Chen [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, Beer Sheva, Israel
[2] Ben Gurion Univ Negev, Avram & Stella Goldstein Goren Dept Biotechnol En, Beer Sheva, Israel
[3] Ben Gurion Univ Negev, Natl Inst Biotechnol Negev, Beer Sheva, Israel
基金
欧洲研究理事会; 以色列科学基金会;
关键词
deep mutational scanning; machine learning; mutant outcome; prediction; protein library; protein-protein interactions; random forest; specificity; structural features; structural stability; PROTEIN BINDING-AFFINITY; STABILITY CHANGES; POTENTIALS; STATE;
D O I
10.1002/prot.26184
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Deep mutational scanning provides unprecedented wealth of quantitative data regarding the functional outcome of mutations in proteins. A single experiment may measure properties (eg, structural stability) of numerous protein variants. Leveraging the experimental data to gain insights about unexplored regions of the mutational landscape is a major computational challenge. Such insights may facilitate further experimental work and accelerate the development of novel protein variants with beneficial therapeutic or industrially relevant properties. Here we present a novel, machine learning approach for the prediction of functional mutation outcome in the context of deep mutational screens. Using sequence (one-hot) features of variants with known properties, as well as structural features derived from models thereof, we train predictive statistical models to estimate the unknown properties of other variants. The utility of the new computational scheme is demonstrated using five sets of mutational scanning data, denoted "targets": (a) protease specificity of APPI (amyloid precursor protein inhibitor) variants; (b-d) three stability related properties of IGBPG (immunoglobulin G-binding beta 1 domain of streptococcal protein G) variants; and (e) fluorescence of GFP (green fluorescent protein) variants. Performance is measured by the overall correlation of the predicted and observed properties, and enrichment-the ability to predict the most potent variants and presumably guide further experiments. Despite the diversity of the targets the statistical models can generalize variant examples thereof and predict the properties of test variants with both single and multiple mutations.
引用
收藏
页码:45 / 57
页数:13
相关论文
共 43 条
  • [1] Differentiable, multi-dimensional, knowledge-based energy terms for torsion angle probabilities and propensities
    Amir, El-Ad David
    Kalisman, Nir
    Keasar, Chen
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 72 (01) : 62 - 73
  • [2] Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon Mutation
    Barlow, Kyle A.
    Conchuir, Shane O.
    Thompson, Samuel
    Suresh, Pooja
    Lucas, James E.
    Heinonen, Markus
    Kortemme, Tanja
    [J]. JOURNAL OF PHYSICAL CHEMISTRY B, 2018, 122 (21) : 5389 - 5399
  • [3] Predicting free energy changes using structural ensembles
    Benedix, Alexander
    Becker, Caroline M.
    de Groot, Bert L.
    Caflisch, Amedeo
    Boeckmann, Rainer A.
    [J]. NATURE METHODS, 2009, 6 (01) : 3 - 4
  • [4] Combinatorial protein engineering of proteolytically resistant mesotrypsin inhibitors as candidates for cancer therapy
    Cohen, Itay
    Kayode, Olumide
    Hockla, Alexandra
    Sankaran, Banumathi
    Radisky, Derek C.
    Radisky, Evette S.
    Papo, Niv
    [J]. BIOCHEMICAL JOURNAL, 2016, 473 : 1329 - 1341
  • [5] Macromolecular modeling with Rosetta
    Das, Rhiju
    Baker, David
    [J]. ANNUAL REVIEW OF BIOCHEMISTRY, 2008, 77 : 363 - 382
  • [6] BeAtMuSiC: prediction of changes in protein-protein binding affinity on mutations
    Dehouck, Yves
    Kwasigroch, Jean Marc
    Rooman, Marianne
    Gilis, Dimitri
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (W1) : W333 - W339
  • [7] PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality
    Dehouck, Yves
    Kwasigroch, Jean Marc
    Gilis, Dimitri
    Rooman, Marianne
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [8] Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0
    Dehouck, Yves
    Grosfils, Aline
    Folch, Benjamin
    Gilis, Dimitri
    Bogaerts, Philippe
    Rooman, Marianne
    [J]. BIOINFORMATICS, 2009, 25 (19) : 2537 - 2543
  • [9] Methods for estimation of model accuracy in CASP12
    Elofsson, Arne
    Joo, Keehyoung
    Keasar, Chen
    Lee, Jooyoung
    Maghrabi, Ali H. A.
    Manavalan, Balachandran
    McGuffin, Liam J.
    Hurtado, David Menendez
    Mirabello, Claudio
    Pilstal, Robert
    Sidi, Tomer
    Uziela, Karolis
    Wallner, Bjorn
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 : 361 - 373
  • [10] Fowler DM, 2014, NAT METHODS, V11, P801, DOI [10.1038/nmeth.3027, 10.1038/NMETH.3027]