Computationally Probing Drug-Protein Interactions Via Support Vector Machine

被引:49
作者
Wang, Yong-Cui [1 ]
Yang, Zhi-Xia [2 ]
Wang, Yong [3 ]
Deng, Nai-Yang [1 ]
机构
[1] China Agr Univ, Coll Sci, Beijing 100083, Peoples R China
[2] Xinjiang Univ, Coll Math & Syst Sci, Urumuchi 830046, Peoples R China
[3] Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Drug-target interaction; Chemical structure; Protein sequence; Imbalance problem; Support vector machine; DIVERSITY-ORIENTED SYNTHESIS;
D O I
10.2174/157018010791163433
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The past decades witnessed extensive efforts to study the relationships among small molecules (drugs, metabolites, or ligands) and proteins due to the scale and complexity of their physical and genetic interactions. Particularly, computationally predicting the drug-protein interactions is fundamentally important in speeding up the process of developing novel therapeutic agents. Here, we present a supervised learning method, support vector machine (SVM), to predict drug-protein interactions by introducing two machine learning ideas. Firstly, the chemical structure similarity among drugs and the genomic sequence similarity among proteins are intuitively encoded as a feature vector to represent a given drug-protein pair. Secondly, we design an automatic procedure to select a gold-standard negative dataset to deal with the training data imbalance issue, i.e., gold-standard positive data is scarce relative to large scale unlabeled data. Our SVM based predictor is validated on four classes of drug target proteins, including enzymes, ion channels, G-protein couple receptors, and nuclear receptors. We find that our method improves the existing methods regarding to true positive rate upon given false positive rate. The functional annotation analysis and database search indicate that our new predictions are worthy of future experimental validation. In addition, follow-up analysis suggests that our method can partly capture the topological features in the drug-protein interaction network. In conclusion, our new method can efficiently identify the potential drug-protein bindings and will promote the further research in drug discovery.
引用
收藏
页码:370 / 378
页数:9
相关论文
共 24 条
[1]  
Arcuri HA, 2010, BMC BIOINFORMATICS, V11, DOI 10.1186/1471-2105-11-12
[2]   Kernel methods for predicting protein-protein interactions [J].
Ben-Hur, A ;
Noble, WS .
BIOINFORMATICS, 2005, 21 :I38-I46
[3]  
Brownell WE, 1997, VOLTA REV, V99, P9
[4]   TTD: Therapeutic Target Database [J].
Chen, X ;
Ji, ZL ;
Chen, YZ .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :412-415
[5]   Chemical space and biology [J].
Dobson, CM .
NATURE, 2004, 432 (7019) :824-828
[6]   Drug discovery: A historical perspective [J].
Drews, J .
SCIENCE, 2000, 287 (5460) :1960-1964
[7]   PDTD: a web-accessible protein database for drug target identification [J].
Gao, Zhenting ;
Li, Honglin ;
Zhang, Hailei ;
Liu, Xiaofeng ;
Kang, Ling ;
Luo, Xiaomin ;
Zhu, Weiliang ;
Chen, Kaixian ;
Wang, Xicheng ;
Jiang, Hualiang .
BMC BIOINFORMATICS, 2008, 9 (1)
[8]   Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching [J].
Gribskov, M ;
Robinson, NL .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :25-33
[9]   SuperTarget and Matador:: resources for exploring drug-target relationships [J].
Guenther, Stefan ;
Kuhn, Michael ;
Dunkel, Mathias ;
Campillos, Monica ;
Senger, Christian ;
Petsalaki, Evangelia ;
Ahmed, Jessica ;
Urdiales, Eduardo Garcia ;
Gewiess, Andreas ;
Jensen, Lars Juhl ;
Schneider, Reinhard ;
Skoblo, Roman ;
Russell, Robert B. ;
Bourne, Philip E. ;
Bork, Peer ;
Preissner, Robert .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D919-D922
[10]   Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays [J].
Haggarty, SJ ;
Koeller, KM ;
Wong, JC ;
Butcher, RA ;
Schreiber, SL .
CHEMISTRY & BIOLOGY, 2003, 10 (05) :383-396