K-means Clustering based SVM Ensemble Methods for Imbalanced Data Problem

被引:0
作者
Lee, Jaedong [1 ]
Lee, Jee-Hyong [1 ]
机构
[1] Sungkyunkwan Univ, Dept Elect & Comp Engn, Suwon, South Korea
来源
2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS) | 2014年
关键词
imbalanced data; data membership; k-means clustering; SVM ensemble method;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When the number of data in one class is significantly larger or less than the data in other class, under machine learning algorithm for classification, a problem of learning generalization occurs to the specific class and this is called imbalanced data problem. In this paper, we propose a novel method to solve the imbalanced data problem. We first divide data into clusters using K-means clustering algorithm and create classifier using the Support Vector Machine (SVM) method on each cluster. Before making classifier for each cluster, we are balancing the data for each cluster using data sampling techniques. After all classifiers are made for each cluster, we validate each classifier's performance using validation data. Final classification result would be calculated using the test data by aggregating all the cluster's classification results. We are using not only the results from the classifiers in each clusters, but also the credit of each classifier and data membership to each cluster. We have verified that the proposed classification method shows better performance than the existing machine learning algorithms for imbalanced data classification problem.
引用
收藏
页码:614 / 617
页数:4
相关论文
共 15 条
[1]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[2]  
Blake C. L., 1998, Uci repository of machine learning databases
[3]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]  
Fernández A, 2011, LECT NOTES ARTIF INT, V6678, P1, DOI 10.1007/978-3-642-21219-2_1
[6]  
Guo H., 2004, SIGKDD Explor Newsl, V6, P30, DOI [DOI 10.1145/1007730.1007736, 10.1145/1007730.1007736]
[7]  
Japkowicz N., 2002, Intelligent Data Analysis, V6, P429
[8]   A music recommendation system with a dynamic K-means clustering algorithm [J].
Kim, Dong-Moon ;
Kim, Kun-Su ;
Park, Kyo-Hyun ;
Lee, Jee-Hyong ;
Lee, Keon Myung .
ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, :399-+
[9]  
Lee J., 2014, P INT C UB INF MAN C
[10]  
Lee KM, 2012, 2012 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY2012), P228, DOI 10.1109/iFUZZY.2012.6409706