An empirical study of learning from imbalanced data using random forest

被引:266
作者
Khoshgoftaar, Taghi M. [1 ]
Golawala, Moiz [1 ]
Van Hulse, Jason [1 ]
机构
[1] Florida Atlantic Univ, Dept Comp Sci & Engn, Boca Raton, FL 33431 USA
来源
19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS | 2007年
关键词
D O I
10.1109/ICTAI.2007.46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper discusses a comprehensive suite of experiments that analyze the performance of the random forest (RF) learner implemented in Weka. RF is a relatively new learner and to the best of our knowledge, only preliminary experimentation on the construction of random forest classifiers in the context of imbalanced data has been reported in previous work. Therefore, the contribution of this study is to provide an extensive empirical evaluation of RF learners built from imbalanced data. What should be the recommended default number of trees in the ensemble? What should the recommended value be for the number of attributes? How does the RF learner perform on imbalanced data when compared. with other commonly-used learners? We address these and other related issues in this work.
引用
收藏
页码:310 / 317
页数:8
相关论文
共 24 条
[1]  
Aha D., 1997, LAZY LEARNING
[2]  
Altendrof J., 2005, FRAUD DETECTION ONLI
[3]  
[Anonymous], P 3 ANN C PRIV SEC T
[4]  
Berenson M.L., 1983, Intermediate Statistical Methods and Applications: A Computer Package Approach, V2nd, DOI 10.2307/2288297
[5]  
Blake C.L., 1998, UCI repository of machine learning databases
[6]  
Boinee P, 2005, ENFORMATIKA, VOL 7: IEC 2005 PROCEEDINGS, P394
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[9]  
Breiman L, 2004, 666 U CAL BERK STAT
[10]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130