Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors

被引:4
作者
Zadorozhny, Anton [1 ]
Smirnov, Anton [1 ]
Filimonov, Dmitry [2 ]
Lagunin, Alexey [1 ,2 ]
机构
[1] Pirogov Russian Natl Res Med Univ, Dept Bioinformat, Bldg 1,Ostrovityanova Str, Moscow 117513, Russia
[2] Inst Biomed Chem, Dept Bioinformat, Moscow, Russia
关键词
IN-SILICO PREDICTION; PRED WEB-SERVICE; ACTIVITY SPECTRA; MUTATIONS; VARIANTS; DISEASE; SITES; PASS;
D O I
10.1093/bioinformatics/btad484
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Next Generation Sequencing technologies make it possible to detect rare genetic variants in individual patients. Currently, more than a dozen software and web services have been created to predict the pathogenicity of variants related with changing of amino acid residues. Despite considerable efforts in this area, at the moment there is no ideal method to classify pathogenic and harmless variants, and the assessment of the pathogenicity is often contradictory. In this article, we propose to use peptides structural formulas of proteins as an amino acid residues substitutions description, rather than a single-letter code. This allowed us to investigate the effectiveness of chemoinformatics approach to assess the pathogenicity of variants associated with amino acid substitutions. Results: The structure-activity relationships analysis relying on protein-specific data and atom centric substructural multilevel neighborhoods of atoms (MNA) descriptors of molecular fragments appeared to be suitable for predicting the pathogenic effect of single amino acid variants. MNA-based Nai<spacing diaeresis>ve Bayes classifier algorithm, ClinVar and humsavar data were used for the creation of structure-activity relationships models for 10 proteins. The performance of the models was compared with 11 different predicting tools: 8 individual (SIFT 4G, Polyphen2 HDIV, MutationAssessor, PROVEAN, FATHMM, MVP, LIST-S2, MutPred) and 3 consensus (M-CAP, MetaSVM, MetaLR). The accuracy of MNA-based method varies for the proteins (AUC: 0.631-0.993; MCC: 0.191-0.891). It was similar for both the results of comparisons with the other individual predictors and third-party protein-specific predictors. For several proteins (BRCA1, BRCA2, COL1A2, and RYR1), the performance of the MNA-based method was outstanding, capable of capturing the pathogenic effect of structural changes in amino acid substitutions. Availability and implementation: The datasets are available as supplemental data at Bioinformatics online. A python script to convert amino acid and nucleotide sequences from single-letter codes to SD files is available at https://github.com/SmirnygaTotoshka/SequenceToSDF. The authors provide trial licenses for MultiPASS software to interested readers upon request.
引用
收藏
页数:8
相关论文
共 43 条
  • [1] Adzhubei Ivan, 2013, Curr Protoc Hum Genet, VChapter 7, DOI 10.1002/0471142905.hg0720s76
  • [2] Prediction and interpretation of deleterious coding variants in terms of protein structural stability
    Ancien, Francois
    Pucci, Fabrizio
    Godfroid, Maxime
    Rooman, Marianne
    [J]. SCIENTIFIC REPORTS, 2018, 8
  • [3] Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkh131, 10.1093/nar/gkw1099]
  • [4] Functional Annotations Improve the Predictive Score of Human Disease-Related Mutations in Proteins
    Calabrese, Remo
    Capriotti, Emidio
    Fariselli, Piero
    Martelli, Pier Luigi
    Casadio, Rita
    [J]. HUMAN MUTATION, 2009, 30 (08) : 1237 - 1244
  • [5] WS-SNPs& GO: a web server for predicting the deleterious effect of human protein variants using functional annotation
    Capriotti, Emidio
    Calabrese, Remo
    Fariselli, Piero
    Martelli, Pier Luigi
    Altman, Russ B.
    Casadio, Rita
    [J]. BMC GENOMICS, 2013, 14
  • [6] Identifying Mendelian disease genes with the Variant Effect Scoring Tool
    Carter, Hannah
    Douville, Christopher
    Stenson, Peter D.
    Cooper, David N.
    Karchin, Rachel
    [J]. BMC GENOMICS, 2013, 14
  • [7] Choi YH, 2012, PLOS ONE, V7, DOI [10.1371/journal.pone.0039927, 10.1371/journal.pone.0046688]
  • [8] Utility of gene-specific algorithms for predicting pathogenicity of uncertain gene variants
    Crockett, David K.
    Lyon, Elaine
    Williams, Marc S.
    Narus, Scott P.
    Facelli, Julio C.
    Mitchell, Joyce A.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (02) : 207 - 211
  • [9] PREDICTION OF THE BIOLOGICAL ACTIVITY SPECTRA OF ORGANIC COMPOUNDS USING THE PASS ONLINE WEB RESOURCE
    Filimonov, D. A.
    Lagunin, A. A.
    Gloriozova, T. A.
    Rudik, A. V.
    Druzhilovskii, D. S.
    Pogodin, P. V.
    Poroikov, V. V.
    [J]. CHEMISTRY OF HETEROCYCLIC COMPOUNDS, 2014, 50 (03) : 444 - 457
  • [10] Guha Rajarshi, 2013, Methods Mol Biol, V993, P81, DOI 10.1007/978-1-62703-342-8_6