Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data

被引:131
作者
Gray, Vanessa E. [1 ]
Hause, Ronald J. [1 ]
Luebeck, Jens [1 ]
Shendure, Jay [1 ,2 ]
Fowler, Douglas M. [1 ,3 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] Howard Hughes Med Inst, Seattle, WA 98195 USA
[3] Univ Washington, Dept Bioengn, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
PROTEIN; MUTATIONS; EFFICIENT; PATHWAY; IMPACT; GENES; TOOLS;
D O I
10.1016/j.cels.2017.11.003
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large datasets describing the quantitative effects of mutations on protein function are becoming increasingly available. Here, we leverage these datasets to develop Envision, which predicts the magnitude of a missense variant's molecular effect. Envision combines 21,026 variant effect measurements from nine large-scale experimental mutagenesis datasets, a hitherto untapped training resource, with a supervised, stochastic gradient boosting learning algorithm. Envision outperforms other missense variant effect predictors both on large-scale mutagenesis data and on an independent test dataset comprising 2,312 TP53 variants whose effects were measured using a low-throughput approach. This dataset was never used for hyperparameter tuning or model training and thus serves as an independent validation set. Envision prediction accuracy is also more consistent across amino acids than other predictors. Finally, we demonstrate that Envision's performance improves as more large-scale mutagenesis data are incorporated. We precompute Envision predictions for every possible single amino acid variant in human, mouse, frog, zebrafish, fruit fly, worm, and yeast proteomes (https://envision.gs.washington.edu/).
引用
收藏
页码:116 / +
页数:12
相关论文
共 38 条
  • [11] Hamosh A, 2005, NUCLEIC ACIDS RES, V33, pD514
  • [12] Mutation effects predicted from sequence co-variation
    Hopf, Thomas A.
    Ingraham, John B.
    Poelwijk, Frank J.
    Scharfe, Charlotta P. I.
    Springer, Michael
    Sander, Chris
    Marks, Debora S.
    [J]. NATURE BIOTECHNOLOGY, 2017, 35 (02) : 128 - 135
  • [13] A rapid, efficient, and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library
    Jain, Pankaj C.
    Varadarajan, Raghavan
    [J]. ANALYTICAL BIOCHEMISTRY, 2014, 449 : 90 - 98
  • [14] DICTIONARY OF PROTEIN SECONDARY STRUCTURE - PATTERN-RECOGNITION OF HYDROGEN-BONDED AND GEOMETRICAL FEATURES
    KABSCH, W
    SANDER, C
    [J]. BIOPOLYMERS, 1983, 22 (12) : 2577 - 2637
  • [15] The ExAC browser: displaying reference data information from over 60 000 exomes
    Karczewski, Konrad J.
    Weisburd, Ben
    Thomas, Brett
    Solomonson, Matthew
    Ruderfer, Douglas M.
    Kavanagh, David
    Hamamsy, Tymor
    Lek, Monkol
    Samocha, Kaitlin E.
    Cummings, Beryl B.
    Birnbaum, Daniel
    Daly, Mark J.
    MacArthur, Daniel G.
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D840 - D845
  • [16] Understanding the function-structure and function-mutation relationships of p53 tumor suppressor protein by high-resolution missense mutation analysis
    Kato, S
    Han, SY
    Liu, W
    Otsuka, K
    Shibata, H
    Kanamaru, R
    Ishioka, C
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (14) : 8424 - 8429
  • [17] A formal perturbation equation between genotype and phenotype determines the Evolutionary Action of protein-coding variations on fitness
    Katsonis, Panagiotis
    Lichtarge, Olivier
    [J]. GENOME RESEARCH, 2014, 24 (12) : 2050 - 2058
  • [18] The IntAct molecular interaction database in 2012
    Kerrien, Samuel
    Aranda, Bruno
    Breuza, Lionel
    Bridge, Alan
    Broackes-Carter, Fiona
    Chen, Carol
    Duesbury, Margaret
    Dumousseau, Marine
    Feuermann, Marc
    Hinz, Ursula
    Jandrasits, Christine
    Jimenez, Rafael C.
    Khadake, Jyoti
    Mahadevan, Usha
    Masson, Patrick
    Pedruzzi, Ivo
    Pfeiffenberger, Eric
    Porras, Pablo
    Raghunath, Arathi
    Roechert, Bernd
    Orchard, Sandra
    Hermjakob, Henning
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D841 - D846
  • [19] Positional conservation and amino acids shape the correct diagnosis and population frequencies of benign and damaging personal amino acid mutations
    Kumar, Sudhir
    Suleski, Michael P.
    Markov, Glenn J.
    Lawrence, Simon
    Marco, Antonio
    Filipski, Alan J.
    [J]. GENOME RESEARCH, 2009, 19 (09) : 1562 - 1569
  • [20] Landrum M.J., 2013, Nucl. Acids Res, V44, P862