Multi-objective evolutionary optimization using the relationship between F1 and accuracy metrics in classification tasks

被引:10
作者
Carlos Fernandez, Juan [1 ]
Carbonero, Mariano [2 ]
Antonio Gutierrez, Pedro [1 ]
Hervas-Martinez, Cesar [1 ]
机构
[1] Univ Cordoba, Dept Comp Sci & Numer Anal, E-14071 Cordoba, Spain
[2] Univ Loyola Andalucia, Dept Quantitat Methods, Cordoba, Spain
关键词
Binary classification; Evaluation metrics; F-1-metric; Multi-objective evolutionary algorithms; PERFORMANCE; ALGORITHM; SYSTEM; MODEL;
D O I
10.1007/s10489-019-01447-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work analyses the complementarity and contrast between two metrics commonly used for evaluating the quality of a binary classifier: the correct classification rate or accuracy, C, and the F-1 metric, which is very popular when dealing with imbalanced datasets. Based on this analysis, a set of constraints relating C and F-1 are defined as a function of the ratio of positive patterns in the dataset. We evaluate the possibility of using a multi-objective evolutionary algorithm guided by this pair of metrics to optimise binary classification models. To check the validity of the constraints, we perform an empirical analysis considering 26 benchmark datasets obtained from the UCI repository and an interesting liver transplant dataset. The results show that the relation is fulfilled and that the use of the algorithm for simultaneously optimising the pair (C,F-1) leads to a generally balanced accuracy for both classes. The experiments also reveal that, in some cases, better results are obtained by using the majority class as the positive class instead of using the minority one, which is the most common approach with imbalanced datasets.
引用
收藏
页码:3447 / 3463
页数:17
相关论文
共 46 条
[1]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[2]   NEATER: filtering of over-sampled data using non-cooperative game theory [J].
Almogahed, B. A. ;
Kakadiaris, I. A. .
SOFT COMPUTING, 2015, 19 (11) :3301-3322
[3]  
[Anonymous], 1997, P 14 INT C ONMACHINE
[4]  
Asuncion A, 2007, UCI MACHING LEARNING
[5]   ADJUSTED GEOMETRIC-MEAN: A NOVEL PERFORMANCE MEASURE FOR IMBALANCED BIOINFORMATICS DATASETS LEARNING [J].
Batuwita, Rukshan ;
Palade, Vasile .
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2012, 10 (04)
[6]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[7]   Use of artificial intelligence as an innovative donor-recipient matching model for liver transplantation: Results from a multicenter Spanish study [J].
Briceno, Javier ;
Cruz-Ramirez, Manuel ;
Prieto, Martin ;
Navasa, Miguel ;
Ortiz de Urbina, Jorge ;
Orti, Rafael ;
Gomez-Bravo, Miguel-Angel ;
Otero, Alejandra ;
Varo, Evaristo ;
Tome, Santiago ;
Clemente, Gerardo ;
Banares, Rafael ;
Barcena, Rafael ;
Cuervas-Mons, Valentin ;
Solorzano, Guillermo ;
Vinaixa, Carmen ;
Rubin, Angel ;
Colmenero, Jordi ;
Valdivieso, Andres ;
Ciria, Ruben ;
Hervas-Martinez, Cesar ;
de la Mata, Manuel .
JOURNAL OF HEPATOLOGY, 2014, 61 (05) :1020-1028
[8]   Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data [J].
Castro, Cristiano L. ;
Braga, Antonio P. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (06) :888-899
[9]  
Chawla NV, 2007, LECT NOTES COMPUT SC, V4472, P397
[10]   RAMOBoost: Ranked Minority Oversampling in Boosting [J].
Chen, Sheng ;
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2010, 21 (10) :1624-1642