Using SMOTE to Deal with Class-Imbalance Problem in Bioactivity Data to Predict mTOR Inhibitors

被引:0
|
作者
Kumari C. [1 ]
Abulaish M. [2 ]
Subbarao N. [3 ]
机构
[1] Department of Computer Science, Jamia Millia Islamia, New Delhi
[2] Department of Computer Science, South Asian University, New Delhi
[3] School of Computational and Integrative Biology, Jawaharlal Nehru University, New Delhi
关键词
Autophagy; Class imbalance; Drug discovery; Kinase; Machine learning; mTOR; Proteomics; SMOTE; Virtual screening;
D O I
10.1007/s42979-020-00156-5
中图分类号
学科分类号
摘要
Machine learning algorithms give sub-optimal performance in the presence of class-imbalanced dataset. Mammalian target of rapamycin (mTOR) is one of the serine/threonine protein kinase, and plays an integral role in autophagy pathway. Autophagy is a cellular pathway for recycling of macromolecules (proteins, lipids, and organelles), which enables eukaryotic cells to adapt metabolism to survive during adverse growth conditions. Targeting mTOR through therapeutic interventions of autophagy pathway establishes mTOR a promising pharmacological target for autophagy modulation in cancer. The bioactivity dataset of mTOR in ChEMBL, a compound bioactivity database maintained by European Bioinformatics Institute, shows disproportionate distribution of active and inactive classes. The predictive models based on this skewed dataset are biased towards prediction of majority class. Hence, we have used Synthetic Minority Over-sampling TEchnique to deal with class-imbalance problem in bioactivity datasets. We have built and evaluated predictive models based on four commonly used classifiers using both class-imbalanced and class-balanced bioactivity datasets, and compared their performance based on various metrics like accuracy, sensitivity, specificity, F1-measure, and AUC. We observe that the classification models based on balanced dataset generally outperform those that are based on class-imbalanced dataset, irrespective of the classifiers used for classification task. We conclude that predictive models trained over class-balanced dataset can be used for screening large compound bioactivity datasets to predict mTOR inhibitors-like compounds. © 2020, Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [1] Using Ensembles for Class-Imbalance Problem to Predict Maintainability of Open Source Software
    Malhotra, Ruchika
    Lata, Kusum
    INTERNATIONAL JOURNAL OF RELIABILITY QUALITY AND SAFETY ENGINEERING, 2020, 27 (05)
  • [2] Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks
    Joloudari, Javad Hassannataj
    Marefat, Abdolreza
    Nematollahi, Mohammad Ali
    Oyelere, Solomon Sunday
    Hussain, Sadiq
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [3] A novel data augmentation approach to fault diagnosis with class-imbalance problem
    Tian, Jilun
    Jiang, Yuchen
    Zhang, Jiusi
    Luo, Hao
    Yin, Shen
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2024, 243
  • [4] Bayes Vector Quantizer for Class-Imbalance Problem
    Diamantini, Claudia
    Potena, Domenico
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (05) : 638 - 651
  • [5] AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learning
    Wang, Jia-Bao
    Zou, Chun-An
    Fu, Guang-Hui
    SCIENTIFIC PROGRAMMING, 2021, 2021
  • [6] AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learning
    Wang, Jia-Bao
    Zou, Chun-An
    Fu, Guang-Hui
    Scientific Programming, 2021, 2021
  • [7] The class-imbalance problem for high-dimensional class prediction
    Lusa, Lara
    Blagus, Rok
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 123 - 126
  • [8] A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem
    Terzi, Duygu Sinanc
    Sagiroglu, Seref
    APPLIED COMPUTER SYSTEMS, 2019, 24 (02) : 104 - 110
  • [9] Towards Mitigating the Class-Imbalance Problem for Partial Label Learning
    Wang, Jing
    Zhang, Min-Ling
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 2427 - 2436
  • [10] SVM Classification: Optimization with the SMOTE Algorithm for the Class Imbalance Problem
    Demidova, Liliya
    Klyueva, Irina
    2017 6TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2017, : 472 - 475