Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification

被引:88
作者
Lee, Wonji [1 ]
Jun, Chi-Hyuck [1 ]
Lee, Jong-Seok [2 ]
机构
[1] POSTECH, Dept Ind & Management Engn, Pohang 37673, South Korea
[2] Sungkyunkwan Univ, Dept Ind Engn, Suwon 16419, South Korea
基金
新加坡国家研究基金会;
关键词
Class imbalance; SVM; Instance categorization; AdaBoost; Weight adjustment; CLASSIFIERS; SMOTE;
D O I
10.1016/j.ins.2016.11.014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To address class imbalance in data, we propose a new weight adjustment factor that is applied to a weighted support vector machine (SVM) as a weak learner of the AdaBoost algorithm. Different factor scores are computed by categorizing instances based on the SVM margin and are assigned to related instances. The SVM margin is used to define borderline and noisy instances, and the factor scores are assigned to only borderline instances and positive noise. The adjustment factor is then employed as a multiplier to the instance weight in the AdaBoost algorithm when learning a weighted SVM. Using 10 real class-imbalanced datasets, we compare the proposed method to a standard SVM and other SVMs combined with various sampling and boosting methods. Numerical experiments show that the proposed method outperforms existing approaches in terms of F-measure and area under the receiver operating characteristic curve, which means that the proposed method is useful for relaxing the class-imbalance problem by addressing well-known degradation issues such as overlap, small disjunct, and data shift problems. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:92 / 103
页数:12
相关论文
共 39 条
[1]  
[Anonymous], 2003, PRACTICAL GUIDE SUPP
[2]  
[Anonymous], INT JOINT C ART INT
[3]  
Bottou Leon, 2007, LARGE SCALE KERNEL M
[4]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[5]   Facial age estimation based on label-sensitive learning and age-oriented regression [J].
Chao, Wei-Lun ;
Liu, Jun-Zuo ;
Ding, Jian-Jiun .
PATTERN RECOGNITION, 2013, 46 (03) :628-641
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]   Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data [J].
Cheng, Fanyong ;
Zhang, Jing ;
Wen, Cuihong .
PATTERN RECOGNITION LETTERS, 2016, 80 :107-112
[8]  
Denil M, 2010, LECT NOTES ARTIF INT, V6085, P220
[9]  
Fan W, 1999, MACHINE LEARNING, PROCEEDINGS, P97
[10]   Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets [J].
Galar, Mikel ;
Fernandez, Alberto ;
Barrenechea, Edurne ;
Bustince, Humberto ;
Herrera, Francisco .
INFORMATION SCIENCES, 2016, 354 :178-196