Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study

被引:13
|
作者
Pes, Barbara [1 ]
Lai, Giuseppina [1 ]
机构
[1] Univ Cagliari, Dipartimento Matemat & Informat, Cagliari, Italy
关键词
Cost-sensitive learning; Class imbalance; High-dimensional data analysis; Feature selection; Random forest; FEATURE-SELECTION METHODS; RANDOM FORESTS; CLASSIFICATION; CLASSIFIERS; PERFORMANCE; CHALLENGES;
D O I
10.7717/peerj-cs.832
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High dimensionality and class imbalance have been largely recognized as important issues in machine learning. A vast amount of literature has indeed investigated suitable approaches to address the multiple challenges that arise when dealing with high-dimensional feature spaces (where each problem instance is described by a large number of features). As well, several learning strategies have been devised to cope with the adverse effects of imbalanced class distributions, which may severely impact on the generalization ability of the induced models. Nevertheless, although both the issues have been largely studied for several years, they have mostly been addressed separately, and their combined effects are yet to be fully understood. Indeed, little research has been so far conducted to investigate which approaches might be best suited to deal with datasets that are, at the same time, high-dimensional and class-imbalanced. To make a contribution in this direction, our work presents a comparative study among different learning strategies that leverage both feature selection, to cope with high dimensionality, as well as cost-sensitive learning methods, to cope with class imbalance. Specifically, different ways of incorporating misclassification costs into the learning process have been explored. Also different feature selection heuristics have been considered, both univariate and multivariate, to comparatively evaluate their effectiveness on imbalanced data. The experiments have been conducted on three challenging benchmarks from the genomic domain, gaining interesting insight into the beneficial impact of combining feature selection and cost-sensitive learning, especially in the presence of highly skewed data distributions.
引用
收藏
页数:32
相关论文
共 50 条
  • [1] Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study
    Pes B.
    Lai G.
    Pes, Barbara (pes@unica.it), 1600, PeerJ Inc. (07):
  • [2] Cost-sensitive learning for imbalanced data streams
    Loezer, Lucas
    Enembreck, Fabricio
    Barddal, Jean Paul
    Britto Jr, Alceu de Souza
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 498 - 504
  • [3] Cost-Sensitive Learning Methods for Imbalanced Data
    Nguyen Thai-Nghe
    Gantner, Zeno
    Schmidt-Thieme, Lars
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [4] A Cost-Sensitive Feature Selection Method for High-Dimensional Data
    An, Chaojie
    Zhou, Qifeng
    14TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2019), 2019, : 1089 - 1094
  • [5] Analysis of imbalanced data using cost-sensitive learning
    Kim, Sojin
    Song, Jongwoo
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2025,
  • [6] Cost-sensitive learning for imbalanced medical data: a review
    Araf, Imane
    Idri, Ali
    Chairi, Ikram
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (04)
  • [7] On the Role of Cost-Sensitive Learning in Imbalanced Data Oversampling
    Krawczyk, Bartosz
    Wozniak, Michal
    COMPUTATIONAL SCIENCE - ICCS 2019, PT III, 2019, 11538 : 180 - 191
  • [8] Cost-sensitive learning for imbalanced medical data: a review
    Imane Araf
    Ali Idri
    Ikram Chairi
    Artificial Intelligence Review, 57
  • [9] Cost-Sensitive Learning based on Performance Metric for Imbalanced Data
    Aurelio, Yuri Sousa
    de Almeida, Gustavo Matheus
    de Castro, Cristiano Leite
    Braga, Antonio Padua
    NEURAL PROCESSING LETTERS, 2022, 54 (04) : 3097 - 3114
  • [10] A Comparison Study of Cost-sensitive Learning and Sampling Methods on Imbalanced Data Sets
    Zhang, Jinwei
    Lu, Huijuan
    Chen, Wutao
    Lu, Yi
    ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING, PTS 1-3, 2011, 271-273 : 1291 - +