Predicting mutant outcome by combining deep mutational scanning and machine learning

被引:10
作者
Sarfati, Hagit [1 ]
Naftaly, Si [2 ,3 ]
Papo, Niv [2 ,3 ]
Keasar, Chen [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, Beer Sheva, Israel
[2] Ben Gurion Univ Negev, Avram & Stella Goldstein Goren Dept Biotechnol En, Beer Sheva, Israel
[3] Ben Gurion Univ Negev, Natl Inst Biotechnol Negev, Beer Sheva, Israel
基金
以色列科学基金会; 欧洲研究理事会;
关键词
deep mutational scanning; machine learning; mutant outcome; prediction; protein library; protein-protein interactions; random forest; specificity; structural features; structural stability; PROTEIN BINDING-AFFINITY; STABILITY CHANGES; POTENTIALS; STATE;
D O I
10.1002/prot.26184
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Deep mutational scanning provides unprecedented wealth of quantitative data regarding the functional outcome of mutations in proteins. A single experiment may measure properties (eg, structural stability) of numerous protein variants. Leveraging the experimental data to gain insights about unexplored regions of the mutational landscape is a major computational challenge. Such insights may facilitate further experimental work and accelerate the development of novel protein variants with beneficial therapeutic or industrially relevant properties. Here we present a novel, machine learning approach for the prediction of functional mutation outcome in the context of deep mutational screens. Using sequence (one-hot) features of variants with known properties, as well as structural features derived from models thereof, we train predictive statistical models to estimate the unknown properties of other variants. The utility of the new computational scheme is demonstrated using five sets of mutational scanning data, denoted "targets": (a) protease specificity of APPI (amyloid precursor protein inhibitor) variants; (b-d) three stability related properties of IGBPG (immunoglobulin G-binding beta 1 domain of streptococcal protein G) variants; and (e) fluorescence of GFP (green fluorescent protein) variants. Performance is measured by the overall correlation of the predicted and observed properties, and enrichment-the ability to predict the most potent variants and presumably guide further experiments. Despite the diversity of the targets the statistical models can generalize variant examples thereof and predict the properties of test variants with both single and multiple mutations.
引用
收藏
页码:45 / 57
页数:13
相关论文
共 43 条