Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data

被引:131
作者
Gray, Vanessa E. [1 ]
Hause, Ronald J. [1 ]
Luebeck, Jens [1 ]
Shendure, Jay [1 ,2 ]
Fowler, Douglas M. [1 ,3 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] Howard Hughes Med Inst, Seattle, WA 98195 USA
[3] Univ Washington, Dept Bioengn, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
PROTEIN; MUTATIONS; EFFICIENT; PATHWAY; IMPACT; GENES; TOOLS;
D O I
10.1016/j.cels.2017.11.003
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large datasets describing the quantitative effects of mutations on protein function are becoming increasingly available. Here, we leverage these datasets to develop Envision, which predicts the magnitude of a missense variant's molecular effect. Envision combines 21,026 variant effect measurements from nine large-scale experimental mutagenesis datasets, a hitherto untapped training resource, with a supervised, stochastic gradient boosting learning algorithm. Envision outperforms other missense variant effect predictors both on large-scale mutagenesis data and on an independent test dataset comprising 2,312 TP53 variants whose effects were measured using a low-throughput approach. This dataset was never used for hyperparameter tuning or model training and thus serves as an independent validation set. Envision prediction accuracy is also more consistent across amino acids than other predictors. Finally, we demonstrate that Envision's performance improves as more large-scale mutagenesis data are incorporated. We precompute Envision predictions for every possible single amino acid variant in human, mouse, frog, zebrafish, fruit fly, worm, and yeast proteomes (https://envision.gs.washington.edu/).
引用
收藏
页码:116 / +
页数:12
相关论文
共 38 条
  • [1] A method and server for predicting damaging missense mutations
    Adzhubei, Ivan A.
    Schmidt, Steffen
    Peshkin, Leonid
    Ramensky, Vasily E.
    Gerasimova, Anna
    Bork, Peer
    Kondrashov, Alexey S.
    Sunyaev, Shamil R.
    [J]. NATURE METHODS, 2010, 7 (04) : 248 - 249
  • [2] [Anonymous], 2001, SciPy: Open source scientific tools for Python
  • [3] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [4] Deng CX, 2000, BIOESSAYS, V22, P728
  • [5] Measuring the activity of protein variants on a large scale using deep mutational scanning
    Fowler, Douglas M.
    Stephany, Jason J.
    Fields, Stanley
    [J]. NATURE PROTOCOLS, 2014, 9 (09) : 2267 - 2284
  • [6] Fowler DM, 2014, NAT METHODS, V11, P801, DOI [10.1038/NMETH.3027, 10.1038/nmeth.3027]
  • [7] Stochastic gradient boosting
    Friedman, JH
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) : 367 - 378
  • [8] The power of multiplexed functional analysis of genetic variants
    Gasperini, Molly
    Starita, Lea
    Shendure, Jay
    [J]. NATURE PROTOCOLS, 2016, 11 (10) : 1782 - 1787
  • [9] AMINO-ACID DIFFERENCE FORMULA TO HELP EXPLAIN PROTEIN EVOLUTION
    GRANTHAM, R
    [J]. SCIENCE, 1974, 185 (4154) : 862 - 864
  • [10] The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity
    Grimm, Dominik G.
    Azencott, Chloe-Agathe
    Aicheler, Fabian
    Gieraths, Udo
    MacArthur, Daniel G.
    Samocha, Kaitlin E.
    Cooper, David N.
    Stenson, Peter D.
    Daly, Mark J.
    Smoller, Jordan W.
    Duncan, Laramie E.
    Borgwardt, Karsten M.
    [J]. HUMAN MUTATION, 2015, 36 (05) : 513 - 523