Predicting mutant outcome by combining deep mutational scanning and machine learning

被引:10
作者
Sarfati, Hagit [1 ]
Naftaly, Si [2 ,3 ]
Papo, Niv [2 ,3 ]
Keasar, Chen [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, Beer Sheva, Israel
[2] Ben Gurion Univ Negev, Avram & Stella Goldstein Goren Dept Biotechnol En, Beer Sheva, Israel
[3] Ben Gurion Univ Negev, Natl Inst Biotechnol Negev, Beer Sheva, Israel
基金
欧洲研究理事会; 以色列科学基金会;
关键词
deep mutational scanning; machine learning; mutant outcome; prediction; protein library; protein-protein interactions; random forest; specificity; structural features; structural stability; PROTEIN BINDING-AFFINITY; STABILITY CHANGES; POTENTIALS; STATE;
D O I
10.1002/prot.26184
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Deep mutational scanning provides unprecedented wealth of quantitative data regarding the functional outcome of mutations in proteins. A single experiment may measure properties (eg, structural stability) of numerous protein variants. Leveraging the experimental data to gain insights about unexplored regions of the mutational landscape is a major computational challenge. Such insights may facilitate further experimental work and accelerate the development of novel protein variants with beneficial therapeutic or industrially relevant properties. Here we present a novel, machine learning approach for the prediction of functional mutation outcome in the context of deep mutational screens. Using sequence (one-hot) features of variants with known properties, as well as structural features derived from models thereof, we train predictive statistical models to estimate the unknown properties of other variants. The utility of the new computational scheme is demonstrated using five sets of mutational scanning data, denoted "targets": (a) protease specificity of APPI (amyloid precursor protein inhibitor) variants; (b-d) three stability related properties of IGBPG (immunoglobulin G-binding beta 1 domain of streptococcal protein G) variants; and (e) fluorescence of GFP (green fluorescent protein) variants. Performance is measured by the overall correlation of the predicted and observed properties, and enrichment-the ability to predict the most potent variants and presumably guide further experiments. Despite the diversity of the targets the statistical models can generalize variant examples thereof and predict the properties of test variants with both single and multiple mutations.
引用
收藏
页码:45 / 57
页数:13
相关论文
共 43 条
  • [11] 2 CRYSTAL-STRUCTURES OF THE B1 IMMUNOGLOBULIN-BINDING DOMAIN OF STREPTOCOCCAL PROTEIN-G AND COMPARISON WITH NMR
    GALLAGHER, T
    ALEXANDER, P
    BRYAN, P
    GILLILAND, GL
    [J]. BIOCHEMISTRY, 1994, 33 (15) : 4721 - 4729
  • [12] The power of multiplexed functional analysis of genetic variants
    Gasperini, Molly
    Starita, Lea
    Shendure, Jay
    [J]. NATURE PROTOCOLS, 2016, 11 (10) : 1782 - 1787
  • [13] Finding the ΔΔG spot: Are predictors of binding affinity changes upon mutations in protein-protein interactions ready for it?
    Geng, Cunliang
    Xue, Li C.
    Roel-Touris, Jorge
    Bonvin, Alexandre M. J. J.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2019, 9 (05)
  • [14] Principles of Protein Stability and Their Application in Computational Design
    Goldenzweig, Adi
    Fleishman, Sarel J.
    [J]. ANNUAL REVIEW OF BIOCHEMISTRY, VOL 87, 2018, 87 : 105 - 129
  • [15] Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data
    Gray, Vanessa E.
    Hause, Ronald J.
    Luebeck, Jens
    Shendure, Jay
    Fowler, Douglas M.
    [J]. CELL SYSTEMS, 2018, 6 (01) : 116 - +
  • [16] Predicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutations
    Guerois, R
    Nielsen, JE
    Serrano, L
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2002, 320 (02) : 369 - 387
  • [17] A simple generalisation of the area under the ROC curve for multiple class classification problems
    Hand, DJ
    Till, RJ
    [J]. MACHINE LEARNING, 2001, 45 (02) : 171 - 186
  • [18] PRSS3/Mesotrypsin Is a Therapeutic Target for Metastatic Prostate Cancer
    Hockla, Alexandra
    Miller, Erin
    Salameh, Moh'd A.
    Copland, John A.
    Radisky, Derek C.
    Radisky, Evette S.
    [J]. MOLECULAR CANCER RESEARCH, 2012, 10 (12) : 1555 - 1566
  • [19] MESHI: a new library of Java']Java classes for molecular modeling
    Kalisman, N
    Levi, A
    Maximova, T
    Reshef, D
    Zafriri-Lynn, S
    Gleyzer, Y
    Keasar, C
    [J]. BIOINFORMATICS, 2005, 21 (20) : 3931 - 3932
  • [20] Role of conformational sampling in computing mutation-induced changes in protein structure and stability
    Kellogg, Elizabeth H.
    Leaver-Fay, Andrew
    Baker, David
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (03) : 830 - 838