Handling class imbalance in high-dimensional biomedical datasets

被引:3
|
作者
Pes, Barbara [1 ]
机构
[1] Univ Cagliari, Dipartimento Matemat & Informat, Via Osped 72, I-09124 Cagliari, Italy
来源
2019 IEEE 28TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES: INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE) | 2019年
关键词
Biomedical data analysis; Dimensionality reduction; Feature selection; Class-imbalance; Class balancing methods; Cost-sensitive classification; CLASSIFICATION;
D O I
10.1109/WETICE.2019.00040
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
When dealing with biomedical data, the first and most challenging issue is often the huge dimensionality, i.e. the presence of a very high number of features for each of the problem instances at hand. A vast literature is available on different dimensionality reduction techniques that can be suitable for handling such kind of data, with a special focus on feature selection algorithms that allow to discard uninformative/useless features. In most cases, however, the dimensionality issue is addressed without a joint consideration of other potential problems in the data, including an imbalanced class distribution that may hinder the construction of effective classification models. Class imbalance, in turn, has been mostly treated in literature as an independent problem, especially in application fields where the number of features is not so critical. But several biomedical datasets are both high-dimensional and class-imbalanced, so there is a strong need for designing and evaluating learning strategies that can properly deal with both the issues simultaneously. In this work, we experiment with using feature selection techniques in conjunction with sampling-based class balancing methods and cost-sensitive classification, in order to gain insight into the most effective strategies to use when dealing with such complex data.
引用
收藏
页码:150 / 155
页数:6
相关论文
共 50 条
  • [1] Learning From High-Dimensional Biomedical Datasets: The Issue of Class Imbalance
    Pes, Barbara
    IEEE ACCESS, 2020, 8 : 13527 - 13540
  • [2] Parameterized Clustering Cleaning Approach for High-Dimensional Datasets with Class Overlap and Imbalance
    Goel N.
    Singaravelu M.
    Gupta S.
    Namana S.
    Singh R.
    Kumar R.
    SN Computer Science, 4 (5)
  • [3] Hybrid Classification of High-Dimensional Biomedical Tumour Datasets
    Byczkowska-Lipinska, Liliana
    Wosiak, Agnieszka
    ADVANCED AND INTELLIGENT COMPUTATIONS IN DIAGNOSIS AND CONTROL, 2016, 386 : 287 - 298
  • [4] A projection method for the visualization of high-dimensional biomedical datasets
    Mandelzweig, M
    Demko, AB
    Dolenko, B
    Somorjai, RL
    Pizzi, NJ
    CCECE 2003: CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-3, PROCEEDINGS: TOWARD A CARING AND HUMANE TECHNOLOGY, 2003, : 1453 - 1456
  • [5] The class-imbalance problem for high-dimensional class prediction
    Lusa, Lara
    Blagus, Rok
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 123 - 126
  • [6] CSViz: Class Separability Visualization for high-dimensional datasets
    Cuesta, Marina
    Lancho, Carmen
    Fernandez-Isabel, Alberto
    Cano, Emilio L.
    De Diego, Isaac Martin
    APPLIED INTELLIGENCE, 2024, 54 (01) : 924 - 946
  • [7] CSViz: Class Separability Visualization for high-dimensional datasets
    Marina Cuesta
    Carmen Lancho
    Alberto Fernández-Isabel
    Emilio L. Cano
    Isaac Martín De Diego
    Applied Intelligence, 2024, 54 : 924 - 946
  • [8] Handling Extreme Class Imbalance in Technical Logbook Datasets
    Akhbardeh, Farhad
    Alm, Cecilia Ovesdotter
    Zampieri, Marcos
    Desell, Travis
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4034 - 4045
  • [9] On Chance Performance in High-Dimensional Class-Imbalance Problems
    Udu, Amadi Gabriel
    Lecchini-Visintini, Andrea
    Dong, Hongbiao
    2024 UKACC 14TH INTERNATIONAL CONFERENCE ON CONTROL, CONTROL, 2024, : 254 - 255
  • [10] Synthetic Generation of High-Dimensional Datasets
    Albuquerque, Georgia
    Loewe, Thomas
    Magnor, Marcus
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (12) : 2317 - 2324