iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach

被引:188
作者
Xiao, Xuan [1 ,2 ,4 ]
Min, Jian-Liang [1 ]
Lin, Wei-Zhong [1 ]
Liu, Zi [1 ]
Cheng, Xiang [1 ]
Chou, Kuo-Chen [3 ,4 ]
机构
[1] Jing De Zhen Ceram Inst, Dept Comp, Jingdezhen 333046, Peoples R China
[2] ZheJiang Text & Fash Coll, Informat Sch, Ningbo 315211, Zhejiang, Peoples R China
[3] King Abdulaziz Univ, Ctr Excellence Genom Med Res CEGMR, Jeddah 21589, Saudi Arabia
[4] Gordon Life Sci Inst, Boston, MA 02478 USA
关键词
SMOTE; iDrug-GPCR; iDrug-Ezy; iDrug-Chl; NCR; chou's PseAAC; iDrug-NR; molecular fingerprints; target-jackknife validation; optimized training data-set; AMINO-ACID-COMPOSITION; SEQUENCE-BASED PREDICTOR; LABEL LEARNING CLASSIFIER; SUPPORT VECTOR MACHINES; M2 PROTON CHANNEL; SUBCELLULAR LOCATION; TERTIARY STRUCTURE; 3D STRUCTURE; MECHANISM; INSIGHTS;
D O I
10.1080/07391102.2014.998710
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Information about the interactions of drug compounds with proteins in cellular networking is very important for drug development. Unfortunately, all the existing predictors for identifying drug-protein interactions were trained by a skewed benchmark data-set where the number of non-interactive drug-protein pairs is overwhelmingly larger than that of the interactive ones. Using this kind of highly unbalanced benchmark data-set to train predictors would lead to the outcome that many interactive drug-protein pairs might be mispredicted as non-interactive. Since the minority interactive pairs often contain the most important information for drug design, it is necessary to minimize this kind of misprediction. In this study, we adopted the neighborhood cleaning rule and synthetic minority over-sampling technique to treat the skewed benchmark datasets and balance the positive and negative subsets. The new benchmark datasets thus obtained are called the optimized benchmark datasets, based on which a new predictor called iDrug-Target was developed that contains four sub-predictors: iDrug-GPCR, iDrug-Chl, iDrug-Ezy, and iDrug-NR, specialized for identifying the interactions of drug compounds with GPCRs (G-protein-coupled receptors), ion channels, enzymes, and NR (nuclear receptors), respectively. Rigorous cross-validations on a set of experiment-confirmed datasets have indicated that these new predictors remarkably outperformed the existing ones for the same purpose. To maximize users' convenience, a public accessible Web server for iDrug-Target has been established at [GRAPHICS] , by which users can easily get their desired results. It has not escaped our notice that the aforementioned strategy can be widely used in many other areas as well.
引用
收藏
页码:2221 / 2233
页数:13
相关论文
共 105 条
[1]  
[Anonymous], 2013, J. Biomed. Sci. Eng, DOI DOI 10.4236/JBISE.2013.64054
[2]   Mitochondrial uncoupling protein 2 structure determined by NMR molecular fragment searching [J].
Berardi, Marcelo J. ;
Shih, William M. ;
Harrison, Stephen C. ;
Chou, James J. .
NATURE, 2011, 476 (7358) :109-113
[3]   Support vector machines for predicting membrane protein types by using functional domain composition [J].
Cai, YD ;
Zhou, GP ;
Chou, KC .
BIOPHYSICAL JOURNAL, 2003, 84 (05) :3257-3263
[4]   The structural basis for intramembrane assembly of an activating immunoreceptor complex [J].
Call, Matthew E. ;
Wucherpfennig, Kai W. ;
Chou, James J. .
NATURE IMMUNOLOGY, 2010, 11 (11) :1023-U73
[5]   propy: a tool to generate various modes of Chou's PseAAC [J].
Cao, Dong-Sheng ;
Xu, Qing-Song ;
Liang, Yi-Zeng .
BIOINFORMATICS, 2013, 29 (07) :960-962
[6]  
Chang C., 2005, LIBSVM A LIBRARY FOR
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]   Prediction of linear B-cell epitopes using amino acid pair antigenicity scale [J].
Chen, J. ;
Liu, H. ;
Yang, J. ;
Chou, K.-C. .
AMINO ACIDS, 2007, 33 (03) :423-428
[9]   Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities [J].
Chen, Lei ;
Zeng, Wei-Ming ;
Cai, Yu-Dong ;
Feng, Kai-Yan ;
Chou, Kuo-Chen .
PLOS ONE, 2012, 7 (04)
[10]   iTIS-PseTNC: A sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Deng, En-Ze ;
Lin, Hao ;
Chou, Kuo-Chen .
ANALYTICAL BIOCHEMISTRY, 2014, 462 :76-83