Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples

被引:16
作者
Cheng, Zhanzhan [1 ,2 ]
Zhou, Shuigeng [1 ,2 ]
Wang, Yang [3 ]
Liu, Hui [4 ]
Guan, Jihong [5 ]
Chen, Yi-Ping Phoebe [2 ,6 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[3] Jiangxi Normal Univ, Sch Software, Nanchang 330022, Jiangxi, Peoples R China
[4] Changzhou Univ, Lab Informat Management, Changzhou 213164, Jiangsu, Peoples R China
[5] Tongji Univ, Dept Comp Sci & Technol, Shanghai, Peoples R China
[6] La Trobe Univ, Dept Comp Sci & Comp Engn, Melbourne, Vic, Australia
关键词
Compound-protein interaction prediction; PU learning; biased-SVM; DRUG-TARGET INTERACTIONS; INTERACTION PREDICTION; INTERACTION NETWORKS; DATABASE; INTEGRATION; RESOURCES; KERNELS;
D O I
10.1109/TCBB.2016.2570211
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Prediction of compound-protein interactions (CPIs) is to find new compound-protein pairs where a protein is targeted by at least a compound, which is a crucial step in new drug design. Currently, a number of machine learning based methods have been developed to predict new CPIs in the literature. However, as there is not yet any publicly available set of validated negative CPIs, most existing machine learning based approaches use the unknown interactions (not validated CPIs) selected randomly as the negative examples to train classifiers for predicting new CPIs. Obviously, this is not quite reasonable and unavoidably impacts the CPI prediction performance. In this paper, we simply take the unknown CPIs as unlabeled examples, and propose a new method called PUCPI (the abbreviation of PU learning for Compound-Protein Interaction identification) that employs biased-SVM (Support Vector Machine) to predict CPIs using only positive and unlabeled examples. PU learning is a class of learning methods that leans from positive and unlabeled (PU) samples. To the best of our knowledge, this is the first work that identifies CPIs using only positive and unlabeled examples. We first collect known CPIs as positive examples and then randomly select compound-protein pairs not in the positive set as unlabeled examples. For each CPI/compound-protein pair, we extract protein domains as protein features and compound substructures as chemical features, then take the tensor product of the corresponding compound features and protein features as the feature vector of the CPI/compound-protein pair. After that, biased-SVM is employed to train classifiers on different datasets of CPIs and compound-protein pairs. Experiments over various datasets show that our method outperforms six typical classifiers, including random forest, L1- and L2-regularized logistic regression, naive Bayes, SVM and k-nearest neighbor (kNN), and three types of existing CPI prediction models. More information can be found at http://admisiudan.edu.cn/projects/pucpi.html
引用
收藏
页码:1832 / 1843
页数:12
相关论文
共 42 条
[1]   Drug-target interaction prediction through domain-tuned network-based inference [J].
Alaimo, Salvatore ;
Pulvirenti, Alfredo ;
Giugno, Rosalba ;
Ferro, Alfredo .
BIOINFORMATICS, 2013, 29 (16) :2004-2008
[2]  
[Anonymous], 2009, SIGKDD Explorations, DOI DOI 10.1145/1656274.1656278
[3]  
[Anonymous], 2002, ICML
[4]  
[Anonymous], CHEM GENOMICS SMALL
[5]  
[Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199
[6]  
[Anonymous], 1999, ADV KERNEL METHODS
[7]   Update on activities at the Universal Protein Resource (UniProt) in 2013 [J].
Apweiler, Rolf ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Alpi, Emanuela ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Casanova, Elisabet Barrera ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chan, Wei Mun ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dimmer, Emily ;
Fazzini, Francesco ;
Gane, Paul ;
Fedotov, Alexander ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Jacobsen, Julius ;
Jones, Rachel ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightingale, Andrew ;
Orchard, Sandra ;
Patient, Samuel ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Sawford, Tony ;
Sehra, Harminder ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D43-D47
[8]   Building text classifiers using positive and unlabeled examples [J].
Bing, L ;
Yang, D ;
Li, XL ;
Lee, WS ;
Yu, PS .
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :179-186
[9]   Gene Ontology Annotations and Resources [J].
Blake, J. A. ;
Dolan, M. ;
Drabkin, H. ;
Hill, D. P. ;
Ni, Li ;
Sitnikov, D. ;
Bridges, S. ;
Burgess, S. ;
Buza, T. ;
McCarthy, F. ;
Peddinti, D. ;
Pillai, L. ;
Carbon, S. ;
Dietze, H. ;
Ireland, A. ;
Lewis, S. E. ;
Mungall, C. J. ;
Gaudet, P. ;
Chisholm, R. L. ;
Fey, P. ;
Kibbe, W. A. ;
Basu, S. ;
Siegele, D. A. ;
McIntosh, B. K. ;
Renfro, D. P. ;
Zweifel, A. E. ;
Hu, J. C. ;
Brown, N. H. ;
Tweedie, S. ;
Alam-Faruque, Y. ;
Apweiler, R. ;
Auchinchloss, A. ;
Axelsen, K. ;
Bely, B. ;
Blatter, M-C. ;
Bonilla, C. ;
Bougueleret, L. ;
Boutet, E. ;
Breuza, L. ;
Bridge, A. ;
Chan, W. M. ;
Chavali, G. ;
Coudert, E. ;
Dimmer, E. ;
Estreicher, A. ;
Famiglietti, L. ;
Feuermann, M. ;
Gos, A. ;
Gruaz-Gumowski, N. ;
Hieta, R. .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D530-D535
[10]   Supervised prediction of drug-target interactions using bipartite local models [J].
Bleakley, Kevin ;
Yamanishi, Yoshihiro .
BIOINFORMATICS, 2009, 25 (18) :2397-2403