Bootstrap re-sampling for unbalanced data in supervised learning

被引:43
作者
Dupret, G [1 ]
Koda, M [1 ]
机构
[1] Univ Tsukuba, Inst Policy & Planning Sci, Tsukuba, Ibaraki 3058573, Japan
关键词
neural networks; decision support systems; simulation; data mining;
D O I
10.1016/S0377-2217(00)00244-7
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper presents a technical framework to assess the impact of re-sampling on the ability of a supervised learning to correctly learn a classification problem. We use the bootstrap expression of the prediction error to identify the optimal re-sampling proportions in binary classification experiments using artificial neural networks. Based on Bayes decision rule and the a priori distribution of the objective data, an estimate for the optimal re-sampling proportion is derived as well as upper and lower bounds for the exact optimal proportion. The analytical considerations to extend the present method to cross-validation and multiple classes are also illustrated. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:141 / 156
页数:16
相关论文
共 20 条
[1]  
Akaike H., 1973, 2 INT S INF THEOR, P268, DOI 10.1007/978-1-4612-1694-0_15
[2]   RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION [J].
ALLEN, DM .
TECHNOMETRICS, 1974, 16 (01) :125-127
[3]  
Duda R. O., 2000, Pattern Classification and Scene Analysis, V2nd
[4]  
DUPRET D, 2000, APORS 2000 AS PAC OP
[5]  
Efron B., 1993, INTRO BOOTSTRAP, DOI 10.1007/978-1-4899-4541-9
[6]  
Haykin S., 1994, NEURAL NETWORKS COMP
[7]  
Hecht-Nielsen R., 1989, Neurocomputing
[8]  
Hertz J., 1991, Introduction to the Theory of Neural Computation
[9]  
*IBM, 1999, INT MIN REL MARK
[10]  
KLEIJNEN JPC, 1998, P SAMO 98 EUR COMM U, P155