Classification of Imbalanced Bioassay Data with Features Learned Using Stacked Autoencoder

被引:0
|
作者
Shah, Jeni [1 ]
Joshi, Manjunath [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol, Gandhinagar, India
关键词
Stacked Autoencoder; SMOTE; Imbalanced data;
D O I
10.1117/12.2679627
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bioassay data classification is an important task in drug discovery. However, the data used in classification is highly imbalanced, leading to inaccuracies in classification for the minority class. We propose a novel approach for classification in which we train separate models by using different features that are derived by training stacked autoencoders (SAE). Experiments are performed on 7 bioassay datasets, in which each data file consists of feature descriptors for every compound along with class label of compound being active, or inactive. We first perform data cleaning using borderline synthetic minority oversampling technique (SMOTE) followed by removing the Tomek links, and then learn different features hierarchically, based on the cleaned data or feature vectors. We then train separate cost-sensitive feed-forward neural network (FNN) classifiers using the hierarchical features in order to obtain the final classification. To increase the True Positive Rate (TPR), a test sample is labeled as active if at least one classifier predicts it as active. In this paper, we demonstrate that by data cleaning and learning separate classifiers one can improve the TPR and F1 score when compared to other machine learning approaches. To the best of our knowledge, the researchers have not yet attempted the use of SAE and FNN for classifying bioassay data.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Classification of Silent Speech in English and Bengali Languages Using Stacked Autoencoder
    Ghosh R.
    Sinha N.
    Phadikar S.
    SN Computer Science, 3 (5)
  • [32] Learning from Synthetic Data Using a Stacked Multichannel Autoencoder
    Zhang, Xi
    Fu, Yanwei
    Jiang, Shanshan
    Sigal, Leonid
    Agam, Gady
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 461 - 464
  • [33] A Classification Method of Imbalanced Big Data Based on Improved SMOTE and Stacked LSTM
    Xu, Wentao
    Journal of Network Intelligence, 2023, 8 (01): : 100 - 112
  • [34] Classification and Diagnosis of the Parkinson Disease by Stacked Autoencoder
    Badem, Hasan
    Caliskan, Abdullah
    Basturk, Alper
    Yuksel, Mehmet Emin
    2016 NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND BIOMEDICAL ENGINEERING (ELECO), 2016, : 499 - 502
  • [35] Deep Sparse Representation Classification with Stacked Autoencoder
    Xu, Bingxin
    Zhou, Xiuling
    2019 15TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2019), 2019, : 73 - 77
  • [36] Discriminative stacked autoencoder for feature representation and classification
    Yiping Gao
    Xinyu Li
    Liang Gao
    Science China Information Sciences, 2020, 63
  • [37] Sparse Inversion of Stacked Autoencoder Classification Machines
    Sarishvili, A.
    Jirstrand, M.
    Adrian, B.
    Wirsen, A.
    PROCEEDINGS OF SEVENTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 4, 2023, 465 : 617 - 631
  • [38] Discriminative stacked autoencoder for feature representation and classification
    Yiping GAO
    Xinyu LI
    Liang GAO
    Science China(Information Sciences), 2020, 63 (02) : 93 - 94
  • [39] Discriminative stacked autoencoder for feature representation and classification
    Gao, Yiping
    Li, Xinyu
    Gao, Liang
    SCIENCE CHINA-INFORMATION SCIENCES, 2020, 63 (02)
  • [40] Classification and diagnosis of cervical cancer with softmax classification with stacked autoencoder
    Adem, Kemal
    Kilicarslan, Serhat
    Comert, Onur
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 115 : 557 - 564