Handling class imbalance in high-dimensional biomedical datasets

被引:3
|
作者
Pes, Barbara [1 ]
机构
[1] Univ Cagliari, Dipartimento Matemat & Informat, Via Osped 72, I-09124 Cagliari, Italy
来源
2019 IEEE 28TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES: INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE) | 2019年
关键词
Biomedical data analysis; Dimensionality reduction; Feature selection; Class-imbalance; Class balancing methods; Cost-sensitive classification; CLASSIFICATION;
D O I
10.1109/WETICE.2019.00040
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
When dealing with biomedical data, the first and most challenging issue is often the huge dimensionality, i.e. the presence of a very high number of features for each of the problem instances at hand. A vast literature is available on different dimensionality reduction techniques that can be suitable for handling such kind of data, with a special focus on feature selection algorithms that allow to discard uninformative/useless features. In most cases, however, the dimensionality issue is addressed without a joint consideration of other potential problems in the data, including an imbalanced class distribution that may hinder the construction of effective classification models. Class imbalance, in turn, has been mostly treated in literature as an independent problem, especially in application fields where the number of features is not so critical. But several biomedical datasets are both high-dimensional and class-imbalanced, so there is a strong need for designing and evaluating learning strategies that can properly deal with both the issues simultaneously. In this work, we experiment with using feature selection techniques in conjunction with sampling-based class balancing methods and cost-sensitive classification, in order to gain insight into the most effective strategies to use when dealing with such complex data.
引用
收藏
页码:150 / 155
页数:6
相关论文
共 50 条
  • [21] Balancing High-Dimensional Datasets with Complex Layers
    Bobrowski, Leon
    24TH INTERNATIONAL CONFERENCE ON ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2023, 2023, 1826 : 62 - 70
  • [22] Detecting Trivariate Associations in High-Dimensional Datasets
    Liu, Chuanlu
    Wang, Shuliang
    Yuan, Hanning
    Dang, Yingxu
    Liu, Xiaojia
    SENSORS, 2022, 22 (07)
  • [23] A general framework for clustering high-dimensional datasets
    Zhao, YC
    Junde, S
    CCECE 2003: CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-3, PROCEEDINGS: TOWARD A CARING AND HUMANE TECHNOLOGY, 2003, : 1091 - 1094
  • [24] High-dimensional feature selection for genomic datasets
    Afshar, Majid
    Usefi, Hamid
    KNOWLEDGE-BASED SYSTEMS, 2020, 206
  • [25] Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification
    Maldonado, Sebastian
    Lopez, Julio
    APPLIED SOFT COMPUTING, 2018, 67 : 94 - 105
  • [26] Visualization of high-dimensional biomedical image data
    Serocka, Peter
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2007, 2007, 4810 : 475 - 482
  • [27] Efficient Representation Learning for High-Dimensional Imbalance Data
    Mirza, Bilal
    Kok, Stanley
    Lin, Zhiping
    Yeo, Yong Kiang
    Lai, Xiaoping
    Cao, Jiuwen
    Sepulveda, Jose
    2016 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2016, : 511 - 515
  • [28] Publishing Private High-dimensional Datasets: A Topological Approach
    Alipourjeddi, Narges
    Miri, Ali
    2022 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2022, : 1142 - 1147
  • [29] An alternative SMOTE oversampling strategy for high-dimensional datasets
    Maldonado, Sebastian
    Lopez, Julio
    Vairetti, Carla
    APPLIED SOFT COMPUTING, 2019, 76 : 380 - 389
  • [30] Improved PSO for Feature Selection on High-Dimensional Datasets
    Tran, Binh
    Xue, Bing
    Zhang, Mengjie
    SIMULATED EVOLUTION AND LEARNING (SEAL 2014), 2014, 8886 : 503 - 515