Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction

被引:130
作者
Kim, Myoung-Jong [1 ]
Kang, Dae-Ki [2 ]
Kim, Hong Bae [3 ]
机构
[1] Pusan Natl Univ, Sch Business, Pusan 609735, South Korea
[2] Dongseo Univ, Dept Comp & Informat Engn, Pusan 617716, South Korea
[3] Dongseo Univ, Div Business, Pusan 617716, South Korea
关键词
Data imbalance; Bankruptcy prediction; Over-sampling; SMOTE; Cost-sensitive boosting; AdaBoost; GMBoost; SUPPORT VECTOR MACHINES; CLASSIFICATION;
D O I
10.1016/j.eswa.2014.08.025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In classification or prediction tasks, data imbalance problem is frequently observed when most of instances belong to one majority class. Data imbalance problem has received considerable attention in machine learning community because it is one of the main causes that degrade the performance of classifiers or predictors. In this paper, we propose geometric mean based boosting algorithm (GMBoost) to resolve data imbalance problem. GMBoost enables learning with consideration of both majority and minority classes because it uses the geometric mean of both classes in error rate and accuracy calculation. To evaluate the performance of GMBoost, we have applied GMBoost to bankruptcy prediction task. The results and their comparative analysis with AdaBoost and cost-sensitive boosting indicate that GMBoost has the advantages of high prediction power and robust learning capability in imbalanced data as well as balanced data distribution. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1074 / 1082
页数:9
相关论文
共 42 条
[1]  
[Anonymous], 2006, Elements of Information Theory
[2]  
[Anonymous], INT JOINT C ART INT
[3]   Classification of imbalanced remote-sensing data by neural networks [J].
Bruzzone, L ;
Serpico, SB .
PATTERN RECOGNITION LETTERS, 1997, 18 (11-13) :1323-1328
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119
[6]   Combating imbalance in network intrusion datasets [J].
Cieslak, David A. ;
Chawla, Nitesh V. ;
Striegel, Aaron .
2006 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, 2006, :732-+
[7]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[8]   An estimator of the mutual information based on a criterion for independence [J].
Darbellay, GA .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1999, 32 (01) :1-17
[9]  
Davenport M. A., 2006, INT C AC SPEECH SIGN
[10]  
Drummond C., 2003, C45 CLASS IMBALANCE