Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations

被引:98
作者
Livesey, Benjamin J. [1 ]
Marsh, Joseph A. [1 ]
机构
[1] Univ Edinburgh, Inst Genet & Mol Med, Human Genet Unit, MRC, Edinburgh, Midlothian, Scotland
基金
英国医学研究理事会;
关键词
missense mutations; phenotype prediction; protein structure; saturation mutagenesis; variant effect; THIAMINE PYROPHOSPHOKINASE DEFICIENCY; MISSENSE VARIANTS; NOVO MUTATIONS; CONSEQUENCES; ANNOTATIONS; SPECTRUM; TOOLS;
D O I
10.15252/msb.20199380
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
To deal with the huge number of novel protein-coding variants identified by genome and exome sequencing studies, many computational variant effect predictors (VEPs) have been developed. Such predictors are often trained and evaluated using different variant data sets, making a direct comparison between VEPs difficult. In this study, we use 31 previously published deep mutational scanning (DMS) experiments, which provide quantitative, independent phenotypic measurements for large numbers of single amino acid substitutions, in order to benchmark and compare 46 different VEPs. We also evaluate the ability of DMS measurements and VEPs to discriminate between pathogenic and benign missense variants. We find that DMS experiments tend to be superior to the top-ranking predictors, demonstrating the tremendous potential of DMS for identifying novel human disease mutations. Among the VEPs, DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method. We further recommend SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL based upon their performance in these analyses.
引用
收藏
页数:12
相关论文
共 66 条
[51]   CADD: predicting the deleteriousness of variants throughout the human genome [J].
Rentzsch, Philipp ;
Witten, Daniela ;
Cooper, Gregory M. ;
Shendure, Jay ;
Kircher, Martin .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D886-D894
[52]   Deep generative models of genetic variation capture the effects of mutations [J].
Riesselman, Adam J. ;
Ingraham, John B. ;
Marks, Debora S. .
NATURE METHODS, 2018, 15 (10) :816-+
[53]   Representativeness of variation benchmark datasets [J].
Schaafsma, Gerard C. P. ;
Vihinen, Mauno .
BMC BIOINFORMATICS, 2018, 19
[54]   Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models [J].
Shihab, Hashem A. ;
Gough, Julian ;
Cooper, David N. ;
Stenson, Peter D. ;
Barker, Gary L. A. ;
Edwards, Keith J. ;
Day, Ian N. M. ;
Gaunt, Tom R. .
HUMAN MUTATION, 2013, 34 (01) :57-65
[55]  
Siepel A, 2005, STAT BIOL HEALTH, P325, DOI 10.1007/0-387-27733-1_12
[56]   SIFT web server: predicting effects of amino acid substitutions on proteins [J].
Sim, Ngak-Leng ;
Kumar, Prateek ;
Hu, Jing ;
Henikoff, Steven ;
Schneider, Georg ;
Ng, Pauline C. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) :W452-W457
[57]   Massively Parallel Functional Analysis of BRCA1 RING Domain Variants [J].
Starita, Lea M. ;
Young, David L. ;
Islam, Muhtadi ;
Kitzman, Jacob O. ;
Gullingsrud, Justin ;
Hause, Ronald J. ;
Fowler, Douglas M. ;
Parvin, Jeffrey D. ;
Shendure, Jay ;
Fields, Stanley .
GENETICS, 2015, 200 (02) :413-+
[58]   Molecular basis of inherited diseases: a structural perspective [J].
Steward, RE ;
MacArthur, MW ;
Laskowski, RA ;
Thornton, JM .
TRENDS IN GENETICS, 2003, 19 (09) :505-513
[59]   Evolvability as a Function of Purifying Selection in TEM-1 β-Lactamase [J].
Stiffler, Michael A. ;
Hekstra, Doeke R. ;
Ranganathan, Rama .
CELL, 2015, 160 (05) :882-892
[60]   SIFT missense predictions for genomes [J].
Vaser, Robert ;
Adusumalli, Swarnaseetha ;
Leng, Sim Ngak ;
Sikic, Mile ;
Ng, Pauline C. .
NATURE PROTOCOLS, 2016, 11 (01) :1-9