PhD-SNPg: a webserver and lightweight tool for scoring single nucleotide variants

被引:173
作者
Capriotti, Emidio [1 ]
Fariselli, Piero [2 ]
机构
[1] Univ Bologna, Dept Biol Geol & Environm Sci BiGeA, Via F Selmi 3, I-40126 Bologna, Italy
[2] Univ Padua, Dept Comparat Biomed & Food Sci, Viale Univ 16, I-35020 Legnaro, PD, Italy
关键词
SEQUENCE; BIOINFORMATICS;
D O I
10.1093/nar/gkx369
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
One of the major challenges in human genetics is to identify functional effects of coding and non-coding single nucleotide variants (SNVs). In the past, several methods have been developed to identify disease-related single amino acid changes but only few tools are able to score the impact of non-coding variants. Among the most popular algorithms, CADD and FATHMM predict the effect of SNVs in non-coding regions combining sequence conservation with several functional features derived from the ENCODE project data. Thus, to run CADD or FATHMM locally, the installation process requires to download a large set of pre-calculated information. To facilitate the process of variant annotation we develop PhD-SNPg, a new easy-to-install and lightweight machine learning method that depends only on sequence-based features. Despite this, PhD-SNPg performs similarly or better than more complex methods. This makes PhD-SNPg ideal for quick SNV interpretation, and as benchmark for tool development. Availability: PhD-SNPg is accessible at http://snps.biofold.org/phdsnpg.
引用
收藏
页码:W247 / W252
页数:6
相关论文
共 24 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]  
[Anonymous], 2011, J. Mach. Learn. Res.
[3]  
[Anonymous], 2015, Nature, DOI [DOI 10.1038/NATURE15393, 10.1038/nature15393]
[4]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[5]   UniProt: a hub for protein information [J].
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Apweiler, Rolf ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Gane, Paul ;
Cas-tro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightin-gale, Andrew ;
Orchard, Sandra ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier ;
Zellner, Hermann ;
Cowley, Andrew ;
Figueira, Luis ;
Li, Weizhong ;
McWilliam, Hamish .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D204-D212
[6]   Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information [J].
Capriotti, E. ;
Calabrese, R. ;
Casadio, R. .
BIOINFORMATICS, 2006, 22 (22) :2729-2734
[7]   Bioinformatics for personal genome interpretation [J].
Capriotti, Emidio ;
Nehrt, Nathan L. ;
Kann, Maricel G. ;
Bromberg, Yana .
BRIEFINGS IN BIOINFORMATICS, 2012, 13 (04) :495-512
[8]   Bioinformatics challenges for personalized medicine [J].
Fernald, Guy Haskin ;
Capriotti, Emidio ;
Daneshjou, Roxana ;
Karczewski, Konrad J. ;
Altman, Russ B. .
BIOINFORMATICS, 2011, 27 (13) :1741-1748
[9]   Genenames.org: the HGNC resources in 2015 [J].
Gray, Kristian A. ;
Yates, Bethan ;
Seal, Ruth L. ;
Wright, Mathew W. ;
Bruford, Elspeth A. .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D1079-D1085
[10]   The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity [J].
Grimm, Dominik G. ;
Azencott, Chloe-Agathe ;
Aicheler, Fabian ;
Gieraths, Udo ;
MacArthur, Daniel G. ;
Samocha, Kaitlin E. ;
Cooper, David N. ;
Stenson, Peter D. ;
Daly, Mark J. ;
Smoller, Jordan W. ;
Duncan, Laramie E. ;
Borgwardt, Karsten M. .
HUMAN MUTATION, 2015, 36 (05) :513-523