An efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: Experimental analysis

被引:12
作者
Chao, Xiangrui [1 ]
Kou, Gang [2 ]
Peng, Yi [3 ]
Fernandez, Alberto [4 ]
机构
[1] Sichuan Univ, Business Sch, Chengdu 610065, Peoples R China
[2] Southwestern Univ Finance & Econ, Sch Business Adm, Chengdu 611130, Peoples R China
[3] Univ Elect Sci & Technol China, Sch Management & Econ, Chengdu 611731, Peoples R China
[4] Univ Granada, Andalusian Res Inst Data Sci & Computat Intellige, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, Spain
基金
中国国家自然科学基金;
关键词
Classification; Imbalanced dataset; Data intrinsic characteristics; Assessment metrics; Efficiency; DECISION-MAKING; CLASSIFICATION; PERFORMANCE; BENCHMARKING; ENSEMBLES; SMOTE;
D O I
10.1016/j.ins.2022.06.045
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Balancing the accuracy rates of the majority and minority classes is challenging in imbalanced classification. Furthermore, data characteristics have a significant impact on the performance of imbalanced classifiers, which are generally neglected by existing evaluation methods. The objective of this study is to introduce a new criterion to comprehensively evaluate imbalanced classifiers. Specifically, we introduce an efficiency curve that is established using data envelopment analysis without explicit inputs (DEA-WEI), to determine the trade-off between the benefits of improved minority class accuracy and the cost of reduced majority class accuracy. In sequence, we analyze the impact of the imbalanced ratio and typical imbalanced data characteristics on the efficiency of the classifiers. Empirical analyses using 68 imbalanced data reveal that traditional classifiers such as C4.5 and the k-nearest neighbor are more effective on disjunct data, whereas ensemble and undersampling techniques are more effective for overlapping and noisy data. The efficiency of cost-sensitive classifiers decreases dramatically when the imbalanced ratio increases. Finally, we investigate the reasons for the different efficiencies of classifiers on imbalanced data and recommend steps to select appropriate classifiers for imbalanced data based on data characteristics. (C) 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:1131 / 1156
页数:26
相关论文
共 49 条
[1]   Assessing the data complexity of imbalanced datasets [J].
Barella, Victor H. ;
Garcia, Luis P. F. ;
de Souto, Marcilio C. P. ;
Lorena, Ana C. ;
de Carvalho, Andre C. P. L. F. .
INFORMATION SCIENCES, 2021, 553 :83-109
[2]   Visual-based analysis of classification measures and their properties for class imbalanced problems [J].
Brzezinski, Dariusz ;
Stefanowski, Jerzy ;
Susmaga, Robert ;
Szczech, Izabela .
INFORMATION SCIENCES, 2018, 462 :242-261
[3]   An efficient consensus reaching framework for large-scale social network group decision making and its application in urban resettlement [J].
Chao, Xiangrui ;
Kou, Gang ;
Peng, Yi ;
Herrera-Viedma, Enrique ;
Herrera, Francisco .
INFORMATION SCIENCES, 2021, 575 :499-527
[4]   A cost-sensitive multi-criteria quadratic programming model for imbalanced data [J].
Chao, Xiangrui ;
Peng, Yi .
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2018, 69 (04) :500-516
[5]   MEASURING EFFICIENCY OF DECISION-MAKING UNITS [J].
CHARNES, A ;
COOPER, WW ;
RHODES, E .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1978, 2 (06) :429-444
[6]   Cost-sensitive positive and unlabeled learning [J].
Chen, Xiuhua ;
Gong, Chen ;
Yang, Jian .
INFORMATION SCIENCES, 2021, 558 :229-245
[7]   Generative Adversarial Networks-Based Imbalance Learning in Software Aging-Related Bug Prediction [J].
Chouhan, Satyendra Singh ;
Rathore, Santosh Singh .
IEEE TRANSACTIONS ON RELIABILITY, 2021, 70 (02) :626-642
[8]   Solving Linear Programs in the Current Matrix Multiplication Time [J].
Cohen, Michael B. ;
Lee, Yin Tat ;
Song, Zhao .
JOURNAL OF THE ACM, 2021, 68 (01)
[9]   DEA-based benchmarking for performance evaluation in pay-for-performance incentive plans [J].
Cook, Wade D. ;
Ramon, Nuria ;
Ruiz, Jose L. ;
Sirvent, Inmaculada ;
Zhu, Joe .
OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2019, 84 :45-54
[10]   Graph-Based Class-Imbalance Learning With Label Enhancement [J].
Du, Guodong ;
Zhang, Jia ;
Jiang, Min ;
Long, Jinyi ;
Lin, Yaojin ;
Li, Shaozi ;
Tan, Kay Chen .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) :6081-6095