Using SMOTE to Deal with Class-Imbalance Problem in Bioactivity Data to Predict mTOR Inhibitors

被引:0
|
作者
Kumari C. [1 ]
Abulaish M. [2 ]
Subbarao N. [3 ]
机构
[1] Department of Computer Science, Jamia Millia Islamia, New Delhi
[2] Department of Computer Science, South Asian University, New Delhi
[3] School of Computational and Integrative Biology, Jawaharlal Nehru University, New Delhi
关键词
Autophagy; Class imbalance; Drug discovery; Kinase; Machine learning; mTOR; Proteomics; SMOTE; Virtual screening;
D O I
10.1007/s42979-020-00156-5
中图分类号
学科分类号
摘要
Machine learning algorithms give sub-optimal performance in the presence of class-imbalanced dataset. Mammalian target of rapamycin (mTOR) is one of the serine/threonine protein kinase, and plays an integral role in autophagy pathway. Autophagy is a cellular pathway for recycling of macromolecules (proteins, lipids, and organelles), which enables eukaryotic cells to adapt metabolism to survive during adverse growth conditions. Targeting mTOR through therapeutic interventions of autophagy pathway establishes mTOR a promising pharmacological target for autophagy modulation in cancer. The bioactivity dataset of mTOR in ChEMBL, a compound bioactivity database maintained by European Bioinformatics Institute, shows disproportionate distribution of active and inactive classes. The predictive models based on this skewed dataset are biased towards prediction of majority class. Hence, we have used Synthetic Minority Over-sampling TEchnique to deal with class-imbalance problem in bioactivity datasets. We have built and evaluated predictive models based on four commonly used classifiers using both class-imbalanced and class-balanced bioactivity datasets, and compared their performance based on various metrics like accuracy, sensitivity, specificity, F1-measure, and AUC. We observe that the classification models based on balanced dataset generally outperform those that are based on class-imbalanced dataset, irrespective of the classifiers used for classification task. We conclude that predictive models trained over class-balanced dataset can be used for screening large compound bioactivity datasets to predict mTOR inhibitors-like compounds. © 2020, Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [41] Loss Re-Scaling VQA: Revisiting the Language Prior Problem From a Class-Imbalance View
    Guo, Yangyang
    Nie, Liqiang
    Cheng, Zhiyong
    Tian, Qi
    Zhang, Min
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 227 - 238
  • [42] GQEO: Nearest neighbor graph-based generalized quadrilateral element oversampling for class-imbalance problem
    Dai, Qi
    Wang, Longhui
    Zhang, Jing
    Ding, Weiping
    Chen, Lifang
    NEURAL NETWORKS, 2025, 184
  • [43] Targeting class imbalance problem using GAN
    Bhagwani, Hitesh
    Agarwal, Sonali
    Kodipalli, Ashwini
    Martis, Roshan Joy
    2021 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER TECHNOLOGIES AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2021, : 318 - 322
  • [44] Class-Imbalance Adversarial Transfer Learning Network for Cross-Domain Fault Diagnosis with Imbalanced Data
    Kuang, Jiachen
    Xu, Guanghua
    Tao, Tangfei
    Wu, Qingqiang
    IEEE Transactions on Instrumentation and Measurement, 2022, 71
  • [45] Class-Imbalance Adversarial Transfer Learning Network for Cross-Domain Fault Diagnosis With Imbalanced Data
    Kuang, Jiachen
    Xu, Guanghua
    Tao, Tangfei
    Wu, Qingqiang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [46] Benchmarking framework for class imbalance problem using novel sampling approach for big data
    Ahlawat, Khyati
    Chug, Anuradha
    Singh, Amit Prakash
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2019, 10 (04) : 824 - 835
  • [47] Benchmarking framework for class imbalance problem using novel sampling approach for big data
    Khyati Ahlawat
    Anuradha Chug
    Amit Prakash Singh
    International Journal of System Assurance Engineering and Management, 2019, 10 : 824 - 835
  • [48] A STUDY OF MACHINE LEARNING ALGORITHMS TO MEASURE THE FEATURE IMPORTANCE IN CLASS-IMBALANCE DATA OF FOOD INSECURITY CASES IN INDONESIA
    Dharmawan, H.
    Sartono, B.
    Kurnia, A.
    Hadi, A. F.
    Ramadhani, E.
    COMMUNICATIONS IN MATHEMATICAL BIOLOGY AND NEUROSCIENCE, 2022,
  • [49] Development of Predictive Models for "Very Poor" Beach Water Quality Gradings Using Class-Imbalance Learning
    Guo, Jiuhao
    Lee, Joseph H. W.
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2021, 55 (21) : 14990 - 15000
  • [50] Handling imbalance data in churn prediction using combined SMOTE and RUS with bagging method
    Hartati, Eka Pura
    Adiwijaya
    Bijaksana, Moch Arif
    INTERNATIONAL CONFERENCE ON DATA AND INFORMATION SCIENCE (ICODIS), 2018, 971