A new data complexity measure for multi-class imbalanced classification tasks

被引:0
|
作者
Han, Mingming [1 ]
Guo, Husheng [1 ,2 ]
Wang, Wenjian [1 ,2 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
[2] Shanxi Univ, Key Lab Computat Intelligence & Chinese Informat P, Minist Educ, Taiyuan 030006, Shanxi, Peoples R China
关键词
Data characteristic; Skewed distribution; Correlation; Multi-class;
D O I
10.1016/j.patcog.2024.110881
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The skewed class distribution and data complexity may severely affect the imbalanced classification results. The cost of classification can be significantly reduced if these data complexity are measured and pre-processed prior to training, particularly when dealing with large-scale and high-dimensional datasets. Although many methods have been proposed to evaluate data complexity, most of them fail to fully consider the interaction among different data characteristics, or the connection between class imbalance and these characteristics, thus posing a serious challenge to effectively evaluate the difficulty of classification. This paper presents a new data complexity measure MFII (multi-factor imbalance index), which measures the combined effects of the skewed class distribution and data characteristics on classification difficulty. In particular, it further comprehensively investigates the impact of overlap size, confusion degree, and sub-cluster structure. VoR (value of resolution) and DoC (degree of consistency) are also proposed to evaluate the resolution and interpretability of complexity measures. The experimental results demonstrate that MFII has lower VoR and a stronger correlation with classification metrics, which indicates that MFII can more accurately evaluate the difficulty of multi-class imbalanced classification tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling
    Agrawal, Astha
    Viktor, Herna L.
    Paquet, Eric
    2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 226 - 233
  • [42] Enhancing Classification Performance of Multi-Class Imbalanced Data Using the OAA-DB Algorithm
    Jeatrakul, Piyasak
    Wong, Kok Wai
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [43] A GAN-Based Data Augmentation Method for Imbalanced Multi-Class Skin Lesion Classification
    Su, Qichen
    Hamed, Haza Nuzly Abdull
    Isa, Mohd Adham
    Hao, Xue
    Dai, Xin
    IEEE ACCESS, 2024, 12 : 16498 - 16513
  • [44] Learning from Combination of Data Chunks for Multi-class Imbalanced Data
    Liu, Xu-Ying
    Li, Qian-Qian
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 1680 - 1687
  • [45] MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification
    Wang, Jiao
    Awang, Norhashidah
    IEEE ACCESS, 2024, 12 : 196929 - 196938
  • [46] BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification
    Guo Haixiang
    Li Yijing
    Li Yanan
    Liu Xiao
    Li Jinling
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 49 : 176 - 193
  • [47] AUC Evaluation of Multi-class Classifier Performance in Imbalanced Data
    Ni, Huangjing
    Wang, Wei
    2010 INTERNATIONAL CONFERENCE ON FUTURE CONTROL AND AUTOMATION (ICFCA 2010), 2010, : 48 - 51
  • [48] Efficient DANNLO classifier for multi-class imbalanced data on Hadoop
    Satyanarayana S.
    Tayar Y.
    Prasad R.S.R.
    International Journal of Information Technology, 2019, 11 (2) : 321 - 329
  • [49] Learning Imbalanced Multi-class Data with Optimal Dichotomy Weights
    Liu, Xu-Ying
    Li, Qian-Qian
    Zhou, Zhi-Hua
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 478 - 487
  • [50] A Partial Labeling Framework for Multi-Class Imbalanced Streaming Data
    Arabmakki, Elaheh
    Kantardzic, Mehmed
    Sethi, Tegjyot Singh
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1018 - 1025