Probabilistic Metric to measure the imbalance in multi-class problems

被引:0
作者
Lopes Agostinho, Solander Patricio [1 ]
Mendes-Moreira, Joao
机构
[1] Univ Porto, LIAAD INESC TEC, R Dr Roberto Frias, P-4200465 Porto, Portugal
来源
FOURTH INTERNATIONAL WORKSHOP ON LEARNING WITH IMBALANCED DOMAINS: THEORY AND APPLICATIONS, VOL 183 | 2022年 / 183卷
关键词
imbalanced data; multi-class domain; classification; probabilistic metric;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In machine learning, imbalanced data has been one of the most relevant issue that the classifiers have to deal with. The most common techniques applied in this scenario are all, somehow, based on oversampling or under sampling concepts, In the former, the number of instances of minority classes are, somehow, increased while in the latter, the number of instances in the majority classes are somehow reduced. By increasing Pre-processing, approaches as the ones described have been well succeeded in binary classification problems.However, as the larger the number of classes, less effective the pre-processing approaches are. Another related problem is that the metrics that evaluate the predictive performance of the classifiers can be not effective in the presence of imbalanced data. The metrics used to measure the predictive performance of classifiers, can be divided into three groups: threshold, ranking and Probabilistic metrics. This paper aimed to purpose a probabilistic metric with the main objective of, given the results of a classifier in a multi-class domain, verify the relation between these result and the imbalance problem. The main purpose of this work, is to build a probabilistic metric based on non-parametric approaches, to measure the effect of imbalance feature of dataset in multi-class problems. As part of the work, a comparison with the existing metrics will be implemented and analyzed, both to understand the relation between them and to choose the best of them according to each scenario.
引用
收藏
页码:151 / 162
页数:12
相关论文
共 21 条
[1]   Relevance-Based Evaluation Metrics for Multi-class Imbalanced Domains [J].
Branco, Paula ;
Torgo, Luis ;
Ribeiro, Rita P. .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 :698-710
[2]  
Brownlee J., 2020, Tour of evaluation metrics for imbalanced classification
[3]  
Deb Subhasish, 2020, P IEEE INT C POW EL, P1, DOI DOI 10.1109/PEDES49360.2020.9379906
[4]   Why Cohen's Kappa should be avoided as performance measure in classification [J].
Delgado, Rosario ;
Tibau, Xavier-Andoni .
PLOS ONE, 2019, 14 (09)
[5]   Comparison of Evaluation Metrics in Classification Applications with Imbalanced Datasets [J].
Fatourechi, Mehrdad ;
Ward, Rabab K. ;
Mason, Steven G. ;
Huggins, Jane ;
Schloegl, Alois ;
Birch, Gary E. .
SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, :777-+
[6]  
Galar D, 2017, EMAINTENANCE: ESSENTIAL ELECTRONIC TOOLS FOR EFFICIENCY, P129, DOI 10.1016/B978-0-12-811153-6.00003-8
[7]  
Grandini M, 2020, Arxiv, DOI arXiv:2008.05756
[8]   Evaluation Measures of the Classification Performance of Imbalanced Data Sets [J].
Gu, Qiong ;
Zhu, Li ;
Cai, Zhihua .
COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, 2009, 51 :461-+
[9]  
Japkowicz N, 2013, IMBALANCED LEARNING: FOUNDATIONS, ALGORITHMS, AND APPLICATIONS, P187
[10]   Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm [J].
Jeatrakul, Piyasak ;
Wong, Kok Wai ;
Fung, Chun Che .
NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 :152-159