Classification of Imbalanced Bioassay Data with Features Learned Using Stacked Autoencoder

被引:0
|
作者
Shah, Jeni [1 ]
Joshi, Manjunath [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol, Gandhinagar, India
关键词
Stacked Autoencoder; SMOTE; Imbalanced data;
D O I
10.1117/12.2679627
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bioassay data classification is an important task in drug discovery. However, the data used in classification is highly imbalanced, leading to inaccuracies in classification for the minority class. We propose a novel approach for classification in which we train separate models by using different features that are derived by training stacked autoencoders (SAE). Experiments are performed on 7 bioassay datasets, in which each data file consists of feature descriptors for every compound along with class label of compound being active, or inactive. We first perform data cleaning using borderline synthetic minority oversampling technique (SMOTE) followed by removing the Tomek links, and then learn different features hierarchically, based on the cleaned data or feature vectors. We then train separate cost-sensitive feed-forward neural network (FNN) classifiers using the hierarchical features in order to obtain the final classification. To increase the True Positive Rate (TPR), a test sample is labeled as active if at least one classifier predicts it as active. In this paper, we demonstrate that by data cleaning and learning separate classifiers one can improve the TPR and F1 score when compared to other machine learning approaches. To the best of our knowledge, the researchers have not yet attempted the use of SAE and FNN for classifying bioassay data.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Imbalanced data classification using MapReduce and relief
    Jedrzejowicz, Joanna
    Kostrzewski, Robert
    Neumann, Jakub
    Zakrzewska, Magdalena
    JOURNAL OF INFORMATION AND TELECOMMUNICATION, 2018, 2 (02) : 217 - 230
  • [42] Hand Crafted Features for Efficient Lung Cancer Diagnosis Using Stacked Autoencoder
    Shaffie, Ahmed
    Soliman, Ahmed
    van Berkel, Victor
    El-Baz, Ayman
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4378 - 4384
  • [43] A Semi-Supervised Stacked Autoencoder Using the Pseudo Label for Classification Tasks
    Lai, Jie
    Wang, Xiaodan
    Xiang, Qian
    Quan, Wen
    Song, Yafei
    ENTROPY, 2023, 25 (09)
  • [44] Experiments on classification of electroencephalography (EEG) signals in imagination of direction using Stacked Autoencoder
    Tomonaga, Kenta
    Hayakawa, Takuya
    Kobayashi, Jun
    ICAROB 2017: PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS, 2017, : P468 - P471
  • [45] A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data
    Hegde, K. Jayashree
    Shenoy, K. Manjula
    Devaraja, K.
    IEEE ACCESS, 2025, 13 : 10559 - 10581
  • [46] Classification of imbalanced PubChem BioAssay data using an efficient algorithm coupled with synthetic minority over-sampling technique
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 247
  • [47] Segmenting Brain Tissues from Chinese Visible Human Dataset by Deep-Learned Features with Stacked Autoencoder
    Zhao, Guangjun
    Wang, Xuchu
    Niu, Yanmin
    Tan, Liwen
    Zhang, Shao-Xiang
    BIOMED RESEARCH INTERNATIONAL, 2016, 2016
  • [48] Stacked generalizations in imbalanced fraud data sets using resampling methods
    Kerwin, Kathleen R.
    Bastian, Nathaniel D.
    JOURNAL OF DEFENSE MODELING AND SIMULATION-APPLICATIONS METHODOLOGY TECHNOLOGY-JDMS, 2021, 18 (03): : 175 - 192
  • [49] Binary classification for imbalanced data using data conformity mechanism
    Zheng, Jian
    Ren, Shumiao
    Zhang, Jingyue
    Wang, Shiyan
    Li, Lin
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [50] Imbalanced Data Stream Classification Using Hybrid Data Preprocessing
    Bobowska, Barbara
    Klikowski, Jakub
    Wozniak, Michal
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 402 - 413