Evolutionary optimization of the area under precision-recall curve for classifying imbalanced multi-class data

被引:0
|
作者
Chabbouh, Marwa [1 ]
Bechikh, Slim [2 ]
Mezura-Montes, Efren [3 ]
Ben Said, Lamjed [1 ]
机构
[1] Univ Tunis, SMART Lab, ISG Campus,Liberty St, Tunis 2000, Tunisia
[2] Univ Tunis, SMART Lab, IEEE SM, ISG Campus,Liberty St, Tunis 2000, Tunisia
[3] Univ Veracruz, Artificial Intelligence Res Inst, Calle Paseo 112, Xalapa 91097, Veracruz, Mexico
关键词
Multi-class classification; Imbalanced data; Genetic-based machine learning; Area under precision-recall curve; DECISION TREES; STATISTICAL COMPARISONS; DATA-SETS; CLASSIFICATION; CLASSIFIERS; ALGORITHM; PERFORMANCE; FRAMEWORK; ENSEMBLES;
D O I
10.1007/s10732-024-09544-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of imbalanced multi-class data is still so far one of the most challenging issues in machine learning and data mining. This task becomes more serious when classes containing fewer instances are located in overlapping regions. Several approaches have been proposed through the literature to deal with these two issues such as the use of decomposition, the design of ensembles, the employment of misclassification costs, and the development of ad-hoc strategies. Despite these efforts, the number of existing works dealing with the imbalance in multi-class data is much reduced compared to the case of binary classification. Moreover, existing approaches still suffer from many limits. These limitations include difficulties in handling imbalances across multiple classes, challenges in adapting sampling techniques, limitations of certain classifiers, the need for specialized evaluation metrics, the complexity of data representation, and increased computational costs. Motivated by these observations, we propose a multi-objective evolutionary induction approach that evolves a population of NLM-DTs (Non-Linear Multivariate Decision Trees) using the theta\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document}-NSGA-III (theta\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document}-Non-dominated Sorting Genetic Algorithm-III) as a search engine. The resulting algorithm is termed EMO-NLM-DT (Evolutionary Multi-objective Optimization of NLM-DTs) and is designed to optimize the construction of NLM-DTs for imbalanced multi-class data classification by simultaneously maximizing both the Macro-Average-Precision and the Macro-Average-Recall as two possibly conflicting objectives. The choice of these two measures as objective functions is motivated by a recent study on the appropriateness of performance metrics for imbalanced data classification, which suggests that the mAURPC (mean Area Under Recall Precision Curve) satisfies all necessary conditions for imbalanced multi-class classification. Moreover, the NLM-DT adoption as a baseline classifier to be optimized allows the generation non-linear hyperplanes that are well-adapted to the classes 'boundaries' geometrical shapes. The statistical analysis of the comparative experimental results on more than twenty imbalanced multi-class data sets reveals the outperformance of EMO-NLM-DT in building NLM-DTs that are highly effective in classifying imbalanced multi-class data compared to seven relevant and recent state-of-the-art methods.
引用
收藏
页数:66
相关论文
共 50 条
  • [41] F-Measure Optimization for Multi-class, Imbalanced Emotion Classification Tasks
    Inan, Toki Tahmid
    Liu, Mingrui
    Shehu, Amarda
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 : 158 - 170
  • [42] Cluster-based Under-sampling with Random Forest for Multi-Class Imbalanced Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Farid, Dewan Md.
    2017 11TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2017,
  • [43] MULTI-CLASS DATA CLASSIFICATION FOR IMBALANCED DATA SET USING COMBINED SAMPLING APPROACHES
    Prachuabsupakij, Wanthanee
    Snonthornphisaj, Nuanwan
    KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2011, : 166 - 171
  • [44] MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification
    Wang, Jiao
    Awang, Norhashidah
    IEEE ACCESS, 2024, 12 : 196929 - 196938
  • [45] A Case Study of Multi-class Classification with Diversified Precision Recall Requirements for Query Disambiguation
    Yang, Yingrui
    Miller, Christopher
    Jiang, Peng
    Moghtaderi, Azadeh
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1633 - 1636
  • [46] A Novel Double Ensemble Algorithm for the Classification of Multi-Class Imbalanced Hyperspectral Data
    Quan, Daying
    Feng, Wei
    Dauphin, Gabriel
    Wang, Xiaofeng
    Huang, Wenjiang
    Xing, Mengdao
    REMOTE SENSING, 2022, 14 (15)
  • [47] Multi-class random forest model to classify wastewater treatment imbalanced data
    Distefano, Veronica
    Palma, Monica
    De Iaco, Sandra
    SOCIO-ECONOMIC PLANNING SCIENCES, 2024, 95
  • [48] Gene Selection in Multi-class Imbalanced Microarray Datasets Using Dynamic Length Particle Swarm Optimization
    Priya, R. Devi
    Sivaraj, R.
    CURRENT BIOINFORMATICS, 2021, 16 (05) : 734 - 748
  • [49] An Experimental Analysis of Drift Detection Methods on Multi-Class Imbalanced Data Streams
    Palli, Abdul Sattar
    Jaafar, Jafreezal
    Gomes, Heitor Murilo
    Hashmani, Manzoor Ahmed
    Gilal, Abdul Rehman
    APPLIED SCIENCES-BASEL, 2022, 12 (22):
  • [50] MCNN-LSTM: Combining CNN and LSTM to Classify Multi-Class Text in Imbalanced News Data
    Hasib, Khan Md
    Azam, Sami
    Karim, Asif
    Marouf, Ahmed Al
    Shamrat, F. M. Javed Mehedi
    Montaha, Sidratul
    Yeo, Kheng Cher
    Jonkman, Mirjam
    Alhajj, Reda
    Rokne, Jon G.
    IEEE ACCESS, 2023, 11 : 93048 - 93063