Prediction of small molecule binding property of protein domains with Bayesian classifiers based on Markov chains

被引:2
作者
Bulashevska, Alla [1 ]
Stein, Martin [1 ]
Jackson, David [1 ]
Eils, Roland [1 ]
机构
[1] German Canc Res Ctr, Theoret Bioinformat Dept, D-69120 Heidelberg, Germany
关键词
Function prediction; Proteomics; Small molecule binding domains; Drug discovery; Bayesian classifiers; Markov chains;
D O I
10.1016/j.compbiolchem.2009.09.005
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate computational methods that can help to predict biological function of a protein from its sequence are of great interest to research biologists and pharmaceutical companies. One approach to assume the function of proteins is to predict the interactions between proteins and other molecules. In this work, we propose a machine learning method that uses a primary sequence of a domain to predict its propensity for interaction with small molecules. By curating the Pfam database with respect to the small molecule binding ability of its component domains, we have constructed a dataset of small molecule binding and non-binding domains. This dataset was then used as training set to learn a Bayesian classifier, which should distinguish members of each class. The domain sequences of both classes are modelled with Markov chains. In a Jack-knife test, our classification procedure achieved the predictive accuracies of 77.2% and 66.7% for binding and non-binding classes respectively. We demonstrate the applicability of our classifier by using it to identify previously unknown small molecule binding domains. Our predictions are available as supplementary material and can provide very Useful information to drug discovery specialists. Given the ubiquitous and essential role small molecules play in biological processes, Our method is important for identifying pharmaceutically relevant components of complete proteomes. The software is available from the author upon request. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:457 / 460
页数:4
相关论文
共 9 条
  • [1] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkh121, 10.1093/nar/gkr1065, 10.1093/nar/gkp985]
  • [2] DETECTION OF NEW GENES IN A BACTERIAL GENOME USING MARKOV-MODELS FOR 3 GENE CLASSES
    BORODOVSKY, M
    MCININCH, JD
    KOONIN, EV
    RUDD, KE
    MEDIGUE, C
    DANCHIN, A
    [J]. NUCLEIC ACIDS RESEARCH, 1995, 23 (17) : 3554 - 3562
  • [3] Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains
    Bulashevska, Alla
    Eils, Roland
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [4] DURBIN R, 1998, PROBABILISTIC MODELS
  • [5] Pfam:: clans, web tools and services
    Finn, Robert D.
    Mistry, Jaina
    Schuster-Bockler, Benjamin
    Griffiths-Jones, Sam
    Hollich, Volker
    Lassmann, Timo
    Moxon, Simon
    Marshall, Mhairi
    Khanna, Ajay
    Durbin, Richard
    Eddy, Sean R.
    Sonnhammer, Erik L. L.
    Bateman, Alex
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : D247 - D251
  • [6] Prediction of β-turns in proteins using the first-order Markov models
    Lin, TH
    Wang, GM
    Wang, YT
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (01): : 123 - 133
  • [7] Mardia K. V., 1979, Multivariate Analysis
  • [8] Microbial gene identification using interpolated Markov models
    Salzberg, SL
    Delcher, AL
    Kasif, S
    White, O
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (02) : 544 - 548
  • [9] Prediction of protein subcellular locations using Markov chain models
    Yuan, Z
    [J]. FEBS LETTERS, 1999, 451 (01) : 23 - 26