The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity

被引:239
|
作者
Grimm, Dominik G. [1 ,2 ,3 ,4 ]
Azencott, Chloe-Agathe [1 ,2 ,5 ,6 ,7 ]
Aicheler, Fabian [1 ,2 ,3 ]
Gieraths, Udo [1 ,2 ]
MacArthur, Daniel G. [8 ,9 ,10 ]
Samocha, Kaitlin E. [8 ,9 ,10 ]
Cooper, David N. [11 ]
Stenson, Peter D. [11 ]
Daly, Mark J. [8 ,9 ,10 ]
Smoller, Jordan W. [10 ,12 ,13 ]
Duncan, Laramie E. [8 ,9 ,10 ]
Borgwardt, Karsten M. [1 ,2 ,3 ,4 ]
机构
[1] Max Planck Inst Intelligent Syst, Machine Learning & Computat Biol Res Grp, Tubingen, Germany
[2] Max Planck Inst Dev Biol, Tubingen, Germany
[3] Univ Tubingen, Zentrum Bioinformat, Tubingen, Germany
[4] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, Basel, Switzerland
[5] PLS Res Univ, MINES ParisTech, CBIO Ctr Computat Biol, Fontainebleau, France
[6] Inst Curie, Paris, France
[7] INSERM, Paris, France
[8] Massachusetts Gen Hosp, Analyt & Translat Genet Unit, Boston, MA 02114 USA
[9] Harvard Univ, Sch Med, Dept Med, Boston, MA USA
[10] Broad Inst MIT & Harvard, Cambridge, MA USA
[11] Cardiff Univ, Sch Med, Inst Med Genet, Cardiff CF10 3AX, S Glam, Wales
[12] Massachusetts Gen Hosp, Psychiat & Neurodev Genet Unit, Boston, MA 02114 USA
[13] Harvard Univ, Sch Med, Dept Psychiat, Boston, MA 02115 USA
关键词
pathogenicity prediction tools; exome sequencing; FUNCTIONAL IMPACT; MUTATIONS; DATABASE; IDENTIFICATION; CONSEQUENCES; LIBRARY; SNVS;
D O I
10.1002/humu.22768
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.
引用
收藏
页码:513 / 523
页数:11
相关论文
共 7 条
  • [1] Performance of In Silico Tools for the Evaluation of UGT1A1 Missense Variants
    Rodrigues, Carina
    Santos-Silva, Alice
    Costa, Elisio
    Bronze-da-Rocha, Elsa
    HUMAN MUTATION, 2015, 36 (12) : 1215 - 1225
  • [2] Missense3D-PPI: A Web Resource to Predict the Impact of Missense Variants at Protein Interfaces Using 3D Structural Data
    Pennica, Cecilia
    Hanna, Gordon
    Islam, Suhail A.
    Sternberg, Michael J. E.
    David, Alessia
    JOURNAL OF MOLECULAR BIOLOGY, 2023, 435 (14)
  • [3] EVALUATION AND COMPARISON OF SCREENING TOOLS USED TO PREDICT THE ADVERSE OUTCOMES OF ELDERLY PATIENTS IN THE EMERGENCY DEPARTMENT
    Bahadirli, Suphi
    Kurt, Erdem
    Ak, Rohat
    Kurt, Sebnem Zeynep Eke
    Sanri, Erkman
    Bulut, Mehtap
    ACTA MEDICA MEDITERRANEA, 2021, 37 (02): : 1133 - 1139
  • [4] Consistency of the Tools That Predict the Impact of Single Nucleotide Variants (SNVs) on Gene Functionality: The BRCA1 Gene
    Murillo, Javier
    Spetale, Flavio
    Guillaume, Serge
    Bulacio, Pilar
    Garcia Labari, Ignacio
    Cailloux, Olivier
    Destercke, Sebastien
    Tapia, Elizabeth
    BIOMOLECULES, 2020, 10 (03)
  • [5] RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding
    Santana-Garcia, Walter
    Rocha-Acevedo, Maria
    Ramirez-Navarro, Lucia
    Mbouamboua, Yvon
    Thieffry, Denis
    Thomas-Chollier, Morgane
    Contreras-Moreira, Bruno
    van Helden, Jacques
    Medina-Rivera, Alejandra
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2019, 17 : 1415 - 1428
  • [6] Structural insights and evaluation of the potential impact of missense variants on the interactions of SLIT2 with ROBO1/4 in cancer progression
    Sengupta, Debmalya
    Bhattacharya, Gairika
    Ganguli, Sayak
    Sengupta, Mainak
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [7] Evaluation in Monogenic Diabetes of the Impact of GCK, HNF1A, and HNF4A Variants on Splicing through the Combined Use of In Silico Tools and Minigene Assays
    Bouvet, Delphine
    Blondel, Amelie
    Agathe, Jean-Madeleine de Sainte
    Leroy, Gwendoline
    Saint-Martin, Cecile
    Bellanne-Chantelot, Christine
    HUMAN MUTATION, 2023, 2023