Tuning model parameters in class-imbalanced learning with precision-recall curve

被引:56
作者
Fu, Guang-Hui [1 ]
Yi, Lun-Zhao [2 ]
Pan, Jianxin [3 ]
机构
[1] Kunming Univ Sci & Technol, Sch Sci, Kunming, Yunnan, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Food Safety Res Inst, Kunming, Yunnan, Peoples R China
[3] Univ Manchester, Sch Math, Manchester M13 9PL, Lancs, England
基金
中国国家自然科学基金;
关键词
class imbalance; measurement; parameter tuning; precision-recall curve; receiver operating characteristic; SUPPORT VECTOR MACHINES; VARIABLE SELECTION; REGULARIZATION; CLASSIFICATION;
D O I
10.1002/bimj.201800148
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
An issue for class-imbalanced learning is what assessment metric should be employed. So far, precision-recall curve (PRC) as a metric is rarely used in practice as compared with its alternative of receiver operating characteristic (ROC). This study investigates the performance of PRC as the evaluating criterion to address the class-imbalanced data and focuses on the comparison of PRC with ROC. The advantages of PRC over ROC on assessing class-imbalanced data are also investigated and tested on our proposed algorithm by tuning the whole model parameters in simulation studies and real data examples. The result shows that PRC is competitive with ROC as performance measurement for handling class-imbalanced data in tuning the model parameters. PRC can be considered as an alternative but effective assessment for preprocessing (such as variable selection) skewed data and building a classifier in class-imbalanced learning.
引用
收藏
页码:652 / 664
页数:13
相关论文
共 46 条
[1]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[2]  
Alex, 2001, Kybernetes, V30, P103, DOI [DOI 10.1108/K.2001.30.1.103.6, 10.1108/k.2001.30.1.103.6, 10.1609/aimag.v22i2.1566, DOI 10.1609/AIMAG.V22I2.1566]
[3]  
Ali A, 2015, Int J Adv Soft Comput Appl, V7, P176
[4]   DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets [J].
Alibeigi, Mina ;
Hashemi, Sattar ;
Hamzeh, Ali .
DATA & KNOWLEDGE ENGINEERING, 2012, 81-82 :67-103
[5]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[6]  
[Anonymous], 2001, J. Am. Stat. Assoc.
[7]  
[Anonymous], 2000, NATURE STAT LEARNING, DOI DOI 10.1007/978-1-4757-3264-1
[8]  
[Anonymous], 2006, P 23 INT C MACHINE L, DOI [10.1145/1143844.1143874, DOI 10.1145/1143844.1143874]
[9]   BEST SUBSET SELECTION VIA A MODERN OPTIMIZATION LENS [J].
Bertsimas, Dimitris ;
King, Angela ;
Mazumder, Rahul .
ANNALS OF STATISTICS, 2016, 44 (02) :813-852
[10]   Early identification of potentially salvageable tissue with MRI-based predictive algorithms after experimental ischemic stroke [J].
Bouts, Mark J. R. J. y ;
Tiebosch, Ivo A. C. W. ;
van der Toorn, Annette ;
Viergever, Max A. ;
Wu, Ona ;
Dijkhuizen, Rick M. .
JOURNAL OF CEREBRAL BLOOD FLOW AND METABOLISM, 2013, 33 (07) :1075-1082