Handling class imbalance in high-dimensional biomedical datasets

被引:3
|
作者
Pes, Barbara [1 ]
机构
[1] Univ Cagliari, Dipartimento Matemat & Informat, Via Osped 72, I-09124 Cagliari, Italy
来源
2019 IEEE 28TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES: INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE) | 2019年
关键词
Biomedical data analysis; Dimensionality reduction; Feature selection; Class-imbalance; Class balancing methods; Cost-sensitive classification; CLASSIFICATION;
D O I
10.1109/WETICE.2019.00040
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
When dealing with biomedical data, the first and most challenging issue is often the huge dimensionality, i.e. the presence of a very high number of features for each of the problem instances at hand. A vast literature is available on different dimensionality reduction techniques that can be suitable for handling such kind of data, with a special focus on feature selection algorithms that allow to discard uninformative/useless features. In most cases, however, the dimensionality issue is addressed without a joint consideration of other potential problems in the data, including an imbalanced class distribution that may hinder the construction of effective classification models. Class imbalance, in turn, has been mostly treated in literature as an independent problem, especially in application fields where the number of features is not so critical. But several biomedical datasets are both high-dimensional and class-imbalanced, so there is a strong need for designing and evaluating learning strategies that can properly deal with both the issues simultaneously. In this work, we experiment with using feature selection techniques in conjunction with sampling-based class balancing methods and cost-sensitive classification, in order to gain insight into the most effective strategies to use when dealing with such complex data.
引用
收藏
页码:150 / 155
页数:6
相关论文
共 50 条
  • [31] CVA file: an index structure for high-dimensional datasets
    An, JY
    Chen, HX
    Furuse, K
    Ohbo, N
    KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 7 (03) : 337 - 357
  • [32] CVA file: an index structure for high-dimensional datasets
    Jiyuan An
    Hanxiong Chen
    Kazutaka Furuse
    Nobuo Ohbo
    Knowledge and Information Systems, 2005, 7 : 337 - 357
  • [33] Estimating the Number of Clusters in High-Dimensional Large Datasets
    Zhu, Xutong
    Li, Lingli
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2023, 19 (02)
  • [34] Systematic Review of Clustering High-Dimensional and Large Datasets
    Pandove, Divya
    Goel, Shivani
    Rani, Rinkle
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (02)
  • [35] Improved PSO for feature selection on high-dimensional datasets
    Tran, Binh (binh.tran@ecs.vuw.ac.nz), 1600, Springer Verlag (8886):
  • [36] Association rule mining algorithms on high-dimensional datasets
    Ai, Dongmei
    Pan, Hongfei
    Li, Xiaoxin
    Gao, Yingxin
    He, Di
    ARTIFICIAL LIFE AND ROBOTICS, 2018, 23 (03) : 420 - 427
  • [37] Antibodies with Adaptive Radius as Prototypes of High-Dimensional Datasets
    Violato, Ricardo P. V.
    Azzolini, Alisson G.
    Von Zuben, Fernando J.
    ARTIFICIAL IMMUNE SYSTEMS, 2010, 6209 : 158 - 170
  • [38] Online AUC Optimization for Sparse High-Dimensional Datasets
    Zhou, Baojian
    Ying, Yiming
    Skiena, Steven
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 881 - 890
  • [39] A clustering scheme for large high-dimensional document datasets
    Jiang, Jung-Yi
    Chen, Jing-Wen
    Lee, Shie-Jue
    ADVANCES IN COMPUTATION AND INTELLIGENCE, PROCEEDINGS, 2007, 4683 : 511 - 519
  • [40] Robust estimates of location and dispersion for high-dimensional datasets
    Maronna, RA
    Zamar, RH
    TECHNOMETRICS, 2002, 44 (04) : 307 - 317