Comparative Studies on Resampling Techniques in Machine Learning and Deep Learning Models for Drug-Target Interaction Prediction

被引:11
作者
Azlim Khan, Azwaar Khan [1 ]
Ahamed Hassain Malim, Nurul Hashimah [1 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, George Town 11800, Malaysia
关键词
drug-target interaction; data resampling; machine learning; deep learning; class imbalance; SMOTE; DISCOVERY; IDENTIFICATION; THERAPY; SMOTE;
D O I
10.3390/molecules28041663
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The prediction of drug-target interactions (DTIs) is a vital step in drug discovery. The success of machine learning and deep learning methods in accurately predicting DTIs plays a huge role in drug discovery. However, when dealing with learning algorithms, the datasets used are usually highly dimensional and extremely imbalanced. To solve this issue, the dataset must be resampled accordingly. In this paper, we have compared several data resampling techniques to overcome class imbalance in machine learning methods as well as to study the effectiveness of deep learning methods in overcoming class imbalance in DTI prediction in terms of binary classification using ten (10) cancer-related activity classes from BindingDB. It is found that the use of Random Undersampling (RUS) in predicting DTIs severely affects the performance of a model, especially when the dataset is highly imbalanced, thus, rendering RUS unreliable. It is also found that SVM-SMOTE can be used as a go-to resampling method when paired with the Random Forest and Gaussian Naive Bayes classifiers, whereby a high F1 score is recorded for all activity classes that are severely and moderately imbalanced. Additionally, the deep learning method called Multilayer Perceptron recorded high F1 scores for all activity classes even when no resampling method was applied.
引用
收藏
页数:22
相关论文
共 62 条
[1]  
a Batista G. E. a P., 2004, P 2003 WORKSH OP SOU, V3, P15
[2]   Deep Learning in Drug Target Interaction Prediction: Current and Future Perspectives [J].
Abbasi, Karim ;
Razzaghi, Parvin ;
Poso, Antti ;
Ghanbari-Ara, Saber ;
Masoudi-Nejad, Ali .
CURRENT MEDICINAL CHEMISTRY, 2021, 28 (11) :2100-2113
[3]  
Agrawal Tanay., 2021, Hyperparameter Optimization in Machine Learning: Make Your Machine Learning and Deep Learning Models More Efficient
[4]  
Al Khaldy M., 2018, Int. Robot. Autom. J, V4, P37, DOI 10.15406/iratj.2018.04.00090
[5]  
[Anonymous], 2016, J. Mach. Learn. Res
[6]  
[Anonymous], Google developers: Image labeling
[7]  
Bagherian M, 2021, BRIEF BIOINFORM, V22, P247, DOI 10.1093/bib/bbz157
[8]   A review on machine learning approaches and trends in drug discovery [J].
Carracedo-Reboredo, Paula ;
Linares-Blanco, Jose ;
Rodriguez-Fernandez, Nereida ;
Cedron, Francisco ;
Novoa, Francisco J. ;
Carballal, Adrian ;
Maojo, Victor ;
Pazos, Alejandro ;
Fernandez-Lozano, Carlos .
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 :4538-4558
[9]   Targeted therapy for non-small cell lung cancer: current standards and the promise of the future [J].
Chan, Bryan A. ;
Hughes, Brett G. M. .
TRANSLATIONAL LUNG CANCER RESEARCH, 2015, 4 (01) :36-54
[10]  
Charlton P., 2016, Med, V44, P34, DOI [DOI 10.1016/J.MPMED.2015.10.012, 10.1016/j.mpmed.2015.10.012]