Parameter-free classification in multi-class imbalanced data sets

被引:20
作者
Cerf, Loic [1 ]
Gay, Dominique [2 ]
Selmaoui-Folcher, Nazha [3 ]
Cremilleux, Bruno [4 ]
Boulicaut, Jean-Francois [5 ]
机构
[1] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Orange Labs, F-22307 Lannion, France
[3] Univ New Caledonia, PPME EA3325, Noumea, New Caledonia
[4] Univ Caen, GREYC CNRS UMR6072, F-14032 Caen, France
[5] Univ Lyon, CNRS, INRIA, INSA Lyon,LIRIS,UMR5205, F-69621 Villeurbanne, France
关键词
Classification; Association rules; Multi-class context; Imbalanced data set; One-Versus-Each framework; DISCOVERY; PATTERNS; SMOTE;
D O I
10.1016/j.datak.2013.06.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many applications deal with classification in multi-class imbalanced contexts. In such difficult situations, classical CBA-like approaches (Classification Based on Association rules) show their limits. Most CBA-like methods actually are One-Vs-All approaches (OVA), i.e., the selected classification rules are relevant for one class and irrelevant for the union of the other classes. In this paper, we point out recurrent problems encountered by OVA approaches applied to multi-class imbalanced data sets (e.g., improper bias towards majority classes, conflicting rules). That is why we propose a new One-Versus-Each (OVE) framework. In this framework, a rule has to be relevant for one class and irrelevant for every other class taken separately. Our approach, called fitcare, is empirically validated on various benchmark data sets and our theoretical findings are confirmed. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:109 / 129
页数:21
相关论文
共 61 条
[31]  
FAYYAD UM, 1993, IJCAI-93, VOLS 1 AND 2, P1022
[32]   Round robin classification [J].
Fürnkranz, J .
JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (04) :721-747
[33]  
Furnkranz J., 1994, INCREMENTAL REDUCED, P70, DOI DOI 10.1016/B978-1-55860-335-6.50017-9
[34]   Interestingness measures for data mining: A survey [J].
Geng, Liqiang ;
Hamilton, Howard J. .
ACM COMPUTING SURVEYS, 2006, 38 (03) :3
[35]  
Grahne G., 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073), P512, DOI 10.1109/ICDE.2000.839450
[36]  
Han JW, 2000, SIGMOD RECORD, V29, P1
[37]   Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm [J].
Jeatrakul, Piyasak ;
Wong, Kok Wai ;
Fung, Chun Che .
NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 :152-159
[38]  
Jian Pei, 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P350
[39]   Making Use of the Most Expressive Jumping Emerging Patterns for Classification [J].
Jinyan Li ;
Guozhu Dong ;
Kotagiri Ramamohanarao .
Knowledge and Information Systems, 2001, 3 (2) :131-145
[40]  
Jovanoski V., 2001, LNCS LNAI, P44, DOI DOI 10.1007/3-540-45329-6