Parameter-free classification in multi-class imbalanced data sets

被引:20
作者
Cerf, Loic [1 ]
Gay, Dominique [2 ]
Selmaoui-Folcher, Nazha [3 ]
Cremilleux, Bruno [4 ]
Boulicaut, Jean-Francois [5 ]
机构
[1] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Orange Labs, F-22307 Lannion, France
[3] Univ New Caledonia, PPME EA3325, Noumea, New Caledonia
[4] Univ Caen, GREYC CNRS UMR6072, F-14032 Caen, France
[5] Univ Lyon, CNRS, INRIA, INSA Lyon,LIRIS,UMR5205, F-69621 Villeurbanne, France
关键词
Classification; Association rules; Multi-class context; Imbalanced data set; One-Versus-Each framework; DISCOVERY; PATTERNS; SMOTE;
D O I
10.1016/j.datak.2013.06.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many applications deal with classification in multi-class imbalanced contexts. In such difficult situations, classical CBA-like approaches (Classification Based on Association rules) show their limits. Most CBA-like methods actually are One-Vs-All approaches (OVA), i.e., the selected classification rules are relevant for one class and irrelevant for the union of the other classes. In this paper, we point out recurrent problems encountered by OVA approaches applied to multi-class imbalanced data sets (e.g., improper bias towards majority classes, conflicting rules). That is why we propose a new One-Versus-Each (OVE) framework. In this framework, a rule has to be relevant for one class and irrelevant for every other class taken separately. Our approach, called fitcare, is empirically validated on various benchmark data sets and our theoretical findings are confirmed. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:109 / 129
页数:21
相关论文
共 61 条
[51]   Cost-sensitive boosting for classification of imbalanced data [J].
Sun, Yamnin ;
Kamel, Mohamed S. ;
Wong, Andrew K. C. ;
Wang, Yang .
PATTERN RECOGNITION, 2007, 40 (12) :3358-3378
[52]   CLASSIFICATION OF IMBALANCED DATA: A REVIEW [J].
Sun, Yanmin ;
Wong, Andrew K. C. ;
Kamel, Mohamed S. .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2009, 23 (04) :687-719
[53]  
Tan P.N., 2016, Introduction to Data Mining
[54]  
Ting KM, 2002, IEEE T KNOWL DATA EN, V14, P659, DOI 10.1109/TKDE.2002.1000348
[55]   Knowledge discovery from imbalanced and noisy data [J].
Van Hulse, Jason ;
Khoshgoftaar, Taghi .
DATA & KNOWLEDGE ENGINEERING, 2009, 68 (12) :1513-1542
[56]   Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets [J].
Verhein, Florian ;
Chawla, Sanjay .
ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, :679-684
[57]   On mining instance-centric classification rules [J].
Wang, Jianyong ;
Karypis, George .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (11) :1497-1511
[58]  
Webb G.I., 2006, P 12 ACM SIGKDD INT, P434
[59]   Cost-sensitive learning by cost-proportionate example weighting [J].
Zadrozny, B ;
Langford, J ;
Abe, N .
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :435-442
[60]  
Zadrozny B., 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P204, DOI 10.1145/502512.502540