Using SMOTE to Deal with Class-Imbalance Problem in Bioactivity Data to Predict mTOR Inhibitors

被引:0
|
作者
Kumari C. [1 ]
Abulaish M. [2 ]
Subbarao N. [3 ]
机构
[1] Department of Computer Science, Jamia Millia Islamia, New Delhi
[2] Department of Computer Science, South Asian University, New Delhi
[3] School of Computational and Integrative Biology, Jawaharlal Nehru University, New Delhi
关键词
Autophagy; Class imbalance; Drug discovery; Kinase; Machine learning; mTOR; Proteomics; SMOTE; Virtual screening;
D O I
10.1007/s42979-020-00156-5
中图分类号
学科分类号
摘要
Machine learning algorithms give sub-optimal performance in the presence of class-imbalanced dataset. Mammalian target of rapamycin (mTOR) is one of the serine/threonine protein kinase, and plays an integral role in autophagy pathway. Autophagy is a cellular pathway for recycling of macromolecules (proteins, lipids, and organelles), which enables eukaryotic cells to adapt metabolism to survive during adverse growth conditions. Targeting mTOR through therapeutic interventions of autophagy pathway establishes mTOR a promising pharmacological target for autophagy modulation in cancer. The bioactivity dataset of mTOR in ChEMBL, a compound bioactivity database maintained by European Bioinformatics Institute, shows disproportionate distribution of active and inactive classes. The predictive models based on this skewed dataset are biased towards prediction of majority class. Hence, we have used Synthetic Minority Over-sampling TEchnique to deal with class-imbalance problem in bioactivity datasets. We have built and evaluated predictive models based on four commonly used classifiers using both class-imbalanced and class-balanced bioactivity datasets, and compared their performance based on various metrics like accuracy, sensitivity, specificity, F1-measure, and AUC. We observe that the classification models based on balanced dataset generally outperform those that are based on class-imbalanced dataset, irrespective of the classifiers used for classification task. We conclude that predictive models trained over class-balanced dataset can be used for screening large compound bioactivity datasets to predict mTOR inhibitors-like compounds. © 2020, Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [31] Evolutionary data analysis for the class imbalance problem
    Khoshgoftaar, Taghi M.
    Seliya, Naeem
    Drown, Dennis J.
    INTELLIGENT DATA ANALYSIS, 2010, 14 (01) : 69 - 88
  • [32] Spatial-SMOTE: An Approach for Handling Class Imbalance in Spatial Time Series Data
    Gavas, Rahul Dasharath
    Ghosh, Soumya Kanti
    Pal, Arpan
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2021, 2024, 13102 : 488 - 497
  • [33] Commentary: The problem of class imbalance in biomedical data
    Ishwaran, Hemant
    O'Brien, Robert
    JOURNAL OF THORACIC AND CARDIOVASCULAR SURGERY, 2021, 161 (06): : 1940 - 1941
  • [34] Semantic Masking: A Novel Technique to Mitigate the Class-Imbalance Problem in Real-Time Semantic Segmentation
    Atif, Nadeem
    Balaji, H.
    Mazhar, Saquib
    Ahamad, Shaik Rafi
    Bhuyan, M. K.
    2022 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2022, : 407 - 412
  • [35] Identification of small open reading frames in plant lncRNA using class-imbalance learning
    Zhao, Siyuan
    Meng, Jun
    Wekesa, Jael Sanyanda
    Luan, Yushi
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 157
  • [36] Swift Imbalance Data Classification using SMOTE and Extreme Learning Machine
    Rustogi, Rishabh
    Prasad, Ayush
    2019 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS 2019), 2019,
  • [37] Implementation of ensemble machine learning classifiers to predict diarrhoea with SMOTEENN, SMOTE, and SMOTETomek class imbalance approaches
    Mbunge, Elliot
    Millham, Richard C.
    Sibiya, Maureen Nokuthula
    Chemhaka, Garikayi
    Takavarasha, Sam, Jr.
    Muchemwa, Benhildah
    Dzinamarira, Tafadzwa
    2023 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY, ICTAS, 2023, : 90 - 95
  • [38] Re-ACGAN: Structural damage identification with class-imbalance reweighted ACGAN for data augmentation
    Xiong, Qingsong
    Xia, Yong
    Xiong, Haibei
    Yuan, Cheng
    Chen, Jiawei
    Kong, Qingzhao
    ENGINEERING STRUCTURES, 2025, 329
  • [39] Loss Re-Scaling VQA: Revisiting the Language Prior Problem from a Class-Imbalance View
    Guo, Yangyang
    Nie, Liqiang
    Cheng, Zhiyong
    Tian, Qi
    Zhang, Min
    IEEE Transactions on Image Processing, 2022, 31 : 227 - 238
  • [40] IMNRFixer: A hybrid approach to alleviate class-imbalance problem for predicting the fixability of Non-Reproducible bugs
    Goyal, Anjali
    Sardana, Neetu
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2023, 35 (03)