Drug-target interaction prediction via class imbalance-aware ensemble learning

被引:99
作者
Ezzat, Ali [1 ]
Wu, Min [2 ]
Li, Xiao-Li [2 ]
Kwoh, Chee-Keong [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Nanyang Ave, Singapore 639798, Singapore
[2] ASTAR, Inst Infocomm Res I2R, Fusionopolis Way, Singapore 138632, Singapore
关键词
Drug-target interaction prediction; Class imbalance; Between-class imbalance; Within-class imbalance; Small disjuncts; Ensemble learning; DESCRIPTORS; KERNELS;
D O I
10.1186/s12859-016-1377-y
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Multiple computational methods for predicting drug-target interactions have been developed to facilitate the drug discovery process. These methods use available data on known drug-target interactions to train classifiers with the purpose of predicting new undiscovered interactions. However, a key challenge regarding this data that has not yet been addressed by these methods, namely class imbalance, is potentially degrading the prediction performance. Class imbalance can be divided into two sub-problems. Firstly, the number of known interacting drug-target pairs is much smaller than that of non-interacting drug-target pairs. This imbalance ratio between interacting and non-interacting drug-target pairs is referred to as the between-class imbalance. Between-class imbalance degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Secondly, there are multiple types of drug-target interactions in the data with some types having relatively fewer members (or are less represented) than others. This variation in representation of the different interaction types leads to another kind of imbalance referred to as the within-class imbalance. In within-class imbalance, prediction results are biased towards the better represented interaction types, leading to more prediction errors in the less represented interaction types. Results: We propose an ensemble learning method that incorporates techniques to address the issues of betweenclass imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. In addition, we simulated cases for new drugs and targets to see how our method would perform in predicting their interactions. New drugs and targets are those for which no prior interactions are known. Our method displayed satisfactory prediction performance and was able to predict many of the interactions successfully. Conclusions: Our proposed method has improved the prediction performance over the existing work, thus proving the importance of addressing problems pertaining to class imbalance in the data.
引用
收藏
页数:10
相关论文
共 40 条
[1]  
[Anonymous], 2004, ACM Sigkdd Explorations Newsletter, DOI [10.1145/1007730.1007734, DOI 10.1145/1007730.1007734]
[2]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[3]   Drug repositioning: Identifying and developing new uses for existing drugs [J].
Ashburn, TT ;
Thor, KB .
NATURE REVIEWS DRUG DISCOVERY, 2004, 3 (08) :673-683
[4]   Supervised prediction of drug-target interactions using bipartite local models [J].
Bleakley, Kevin ;
Yamanishi, Yoshihiro .
BIOINFORMATICS, 2009, 25 (18) :2397-2403
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions [J].
Cao, Dong-Sheng ;
Xiao, Nan ;
Xu, Qing-Song ;
Chen, Alex F. .
BIOINFORMATICS, 2015, 31 (02) :279-281
[7]   Large-scale prediction of drug-target interactions using protein sequences and drug topological structures [J].
Cao, Dong-Sheng ;
Liu, Shao ;
Xu, Qing-Song ;
Lu, Hong-Mei ;
Huang, Jian-Hua ;
Hu, Qian-Nan ;
Liang, Yi-Zeng .
ANALYTICA CHIMICA ACTA, 2012, 752 :1-10
[8]   Drug-target interaction prediction by random walk on the heterogeneous network [J].
Chen, Xing ;
Liu, Ming-Xi ;
Yan, Gui-Ying .
MOLECULAR BIOSYSTEMS, 2012, 8 (07) :1970-1978
[9]   Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference [J].
Cheng, Feixiong ;
Liu, Chuang ;
Jiang, Jing ;
Lu, Weiqiang ;
Li, Weihua ;
Liu, Guixia ;
Zhou, Weixing ;
Huang, Jin ;
Tang, Yun .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (05)
[10]   Drug-Target Interaction Prediction with Graph Regularized Matrix Factorization [J].
Ezzat, Ali ;
Zhao, Peilin ;
Wu, Min ;
Li, Xiao-Li ;
Kwoh, Chee-Keong .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (03) :646-656