A virtual multi-label approach to imbalanced data classification

被引:1
|
作者
Chou, Elizabeth P. [1 ]
Yang, Shan-Ping [1 ]
机构
[1] Natl Chengchi Univ, Dept Stat, Taipei, Taiwan
关键词
Imbalance; Classification; Virtual multi-label; Equal k-means; SUPPORT;
D O I
10.1080/03610918.2022.2049820
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Traditional classifiers tend to place all subjects in the majority group, resulting in biased predictions. Machine learning studies are typically conducted from one of two perspectives: a data-based perspective or a model-based perspective. Oversampling and undersampling are examples of data-based approaches, while the addition of costs, penalties, or weights to optimize the algorithm is typical of a model-based approach. Some ensemble methods have been studied recently. These methods cause various problems, such as overfitting, the omission of some information, and long computation times. In addition, these methods do not apply to all kinds of datasets. Based on this problem, the virtual labels (ViLa) approach for the majority label is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is demonstrated in the study. The proposed method is compared with commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and one-class SVM). The results show that the proposed method performs better when the degree of data imbalance increases and will gradually outperform other methods.
引用
收藏
页码:1461 / 1471
页数:11
相关论文
共 50 条
  • [41] Data scarcity, robustness and extreme multi-label classification
    Rohit Babbar
    Bernhard Schölkopf
    Machine Learning, 2019, 108 : 1329 - 1351
  • [42] A multimodal approach for multi-label movie genre classification
    Mangolin, Rafael B.
    Pereira, Rodolfo M.
    Britto, Alceu S., Jr.
    Silla, Carlos N., Jr.
    Feltrim, Valeria D.
    Bertolini, Diego
    Costa, Yandre M. G.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (14) : 19071 - 19096
  • [43] A combinatorial optimization approach for multi-label associative classification
    Zou, Yuchun
    Chou, Chun-An
    KNOWLEDGE-BASED SYSTEMS, 2022, 240
  • [44] Multi-label Approach for Human-Face Classification
    Mohammed, Ahmed Abdulateef
    Sajjanhar, Atul
    Nasierding, Gulisong
    2015 8TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2015, : 648 - 653
  • [45] Optimization approach for feature selection in multi-label classification
    Lim, Hyunki
    Lee, Jaesung
    Kim, Dae-Won
    PATTERN RECOGNITION LETTERS, 2017, 89 : 25 - 30
  • [46] Limiting Data Exposure in Multi-Label Classification Processes
    Anciaux, Nicolas
    Boutara, Danae
    Nguyen, Benjamin
    Vazirgiannis, Michalis
    FUNDAMENTA INFORMATICAE, 2015, 137 (02) : 219 - 236
  • [47] Data scarcity, robustness and extreme multi-label classification
    Babbar, Rohit
    Schoelkopf, Bernhard
    MACHINE LEARNING, 2019, 108 (8-9) : 1329 - 1351
  • [48] Weighted Ensemble Classification of Multi-label Data Streams
    Wang, Lulu
    Shen, Hong
    Tian, Hui
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT II, 2017, 10235 : 551 - 562
  • [49] A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data
    Ren, Weishuo
    Zheng, Yifeng
    Zhang, Wenjie
    Qing, Depeng
    Zeng, Xianlong
    Li, Guohe
    NEUROCOMPUTING, 2025, 612
  • [50] The use of data-derived label hierarchies in multi-label classification
    Gjorgji Madjarov
    Dejan Gjorgjevikj
    Ivica Dimitrovski
    Sašo Džeroski
    Journal of Intelligent Information Systems, 2016, 47 : 57 - 90