Mining Rare Events Data by Sampling and Boosting: A Case Study

被引:0
作者
Au, Tom
Chin, Meei-Ling Ivy
Ma, Guangqin
机构
来源
INFORMATION SYSTEMS, TECHNOLOGY AND MANAGEMENT, PROCEEDINGS | 2010年 / 54卷
关键词
Rare Events; Boosting; Case-Based Sampling and AUC;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In data mining, popular model ensemble technique like boosting is often used to improve predictive models performance. When mining data with rare events (far less than 5%), though boosting may improve a model's overall prediction power, but the accuracy and efficiency of model estimation is negatively impacted when the simple random sampling procedure is employed. In this study we investigate the performance of applying the boosting technique to an Unbalanced sample procedure called case-based sampling. We demonstrate the performance of the combined procedure in predicting customer attrition with an actual telecommunications data. Our results show that the combination of boosting and case-based sampling is very effective at alleviating the problem of rare events.
引用
收藏
页码:373 / 379
页数:7
相关论文
共 11 条
[1]  
Au T., 2003, J COMP INT MANAG, V6, P10
[2]   Statistics in epidemiology: The case-control study [J].
Breslow, NE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (433) :14-28
[3]   Additive logistic regression: A statistical view of boosting - Rejoinder [J].
Friedman, J ;
Hastie, T ;
Tibshirani, R .
ANNALS OF STATISTICS, 2000, 28 (02) :400-407
[4]   THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1982, 143 (01) :29-36
[5]  
JACOB R, 1994, FORTUNE, V19, P200
[6]  
KING G, 2001, SOC POLITICAL ME FEB, P137
[7]  
LI S, 1995, MARKETING RES FAL, P17
[8]  
LI S, 1994, APPL DEMOGRAPHY, P183
[9]  
MA GQ, 1993, MED DECIS MAKING, V13, P191
[10]  
PRENTICE RL, 1986, BIOMETRIKA, V73, P1