A new data complexity measure for multi-class imbalanced classification tasks

被引:0
|
作者
Han, Mingming [1 ]
Guo, Husheng [1 ,2 ]
Wang, Wenjian [1 ,2 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
[2] Shanxi Univ, Key Lab Computat Intelligence & Chinese Informat P, Minist Educ, Taiyuan 030006, Shanxi, Peoples R China
关键词
Data characteristic; Skewed distribution; Correlation; Multi-class;
D O I
10.1016/j.patcog.2024.110881
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The skewed class distribution and data complexity may severely affect the imbalanced classification results. The cost of classification can be significantly reduced if these data complexity are measured and pre-processed prior to training, particularly when dealing with large-scale and high-dimensional datasets. Although many methods have been proposed to evaluate data complexity, most of them fail to fully consider the interaction among different data characteristics, or the connection between class imbalance and these characteristics, thus posing a serious challenge to effectively evaluate the difficulty of classification. This paper presents a new data complexity measure MFII (multi-factor imbalance index), which measures the combined effects of the skewed class distribution and data characteristics on classification difficulty. In particular, it further comprehensively investigates the impact of overlap size, confusion degree, and sub-cluster structure. VoR (value of resolution) and DoC (degree of consistency) are also proposed to evaluate the resolution and interpretability of complexity measures. The experimental results demonstrate that MFII has lower VoR and a stronger correlation with classification metrics, which indicates that MFII can more accurately evaluate the difficulty of multi-class imbalanced classification tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Classification of Multi-class Imbalanced Data: Data Difficulty Factors and Selected Methods for Improving Classifiers
    Stefanowski, Jerzy
    ROUGH SETS (IJCRS 2021), 2021, 12872 : 57 - 72
  • [32] An Algorithm for Selective Preprocessing of Multi-class Imbalanced Data
    Wojciechowski, Szymon
    Wilk, Szymon
    Stefanowski, Jerzy
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS CORES 2017, 2018, 578 : 238 - 247
  • [33] A Dynamic Sampling Framework for Multi-Class Imbalanced Data
    Debowski, B.
    Areibi, S.
    Grewal, G.
    Tempelman, J.
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 113 - 118
  • [34] An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md.
    2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [35] Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification
    Yao, Leehter
    Lin, Tung-Bin
    SENSORS, 2021, 21 (19)
  • [36] Deep Spatio-Temporal Representation Learning for Multi-Class Imbalanced Data Classification
    Pouyanfar, Samira
    Chen, Shu-Ching
    Shyu, Mei-Ling
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 386 - 393
  • [37] Plankton Image Classification via Multi-class Imbalanced Learning
    Ding, Hao
    Wei, Bin
    Tang, Ning
    Yu, Zhibin
    Wang, Nan
    Zheng, Haiyong
    Zheng, Bing
    2018 OCEANS - MTS/IEEE KOBE TECHNO-OCEANS (OTO), 2018,
  • [38] Multi-class imbalanced image classification using conditioned GANs
    Kumar, M. R. Pavan
    Jayagopal, Prabhu
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2021, 10 (03) : 143 - 153
  • [39] Multi-class imbalanced image classification using conditioned GANs
    M R Pavan Kumar
    Prabhu Jayagopal
    International Journal of Multimedia Information Retrieval, 2021, 10 : 143 - 153
  • [40] Comparative Analysis using Various Performance Metrics in Imbalanced Data for Multi-class Text Classification
    Riyanto, Slamet
    Sitanggang, Imas Sukaesih
    Djatna, Taufik
    Atikah, Tika Dewi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 1082 - 1090