K-means Clustering based SVM Ensemble Methods for Imbalanced Data Problem

被引:0
|
作者
Lee, Jaedong [1 ]
Lee, Jee-Hyong [1 ]
机构
[1] Sungkyunkwan Univ, Dept Elect & Comp Engn, Suwon, South Korea
来源
2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS) | 2014年
关键词
imbalanced data; data membership; k-means clustering; SVM ensemble method;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When the number of data in one class is significantly larger or less than the data in other class, under machine learning algorithm for classification, a problem of learning generalization occurs to the specific class and this is called imbalanced data problem. In this paper, we propose a novel method to solve the imbalanced data problem. We first divide data into clusters using K-means clustering algorithm and create classifier using the Support Vector Machine (SVM) method on each cluster. Before making classifier for each cluster, we are balancing the data for each cluster using data sampling techniques. After all classifiers are made for each cluster, we validate each classifier's performance using validation data. Final classification result would be calculated using the test data by aggregating all the cluster's classification results. We are using not only the results from the classifiers in each clusters, but also the credit of each classifier and data membership to each cluster. We have verified that the proposed classification method shows better performance than the existing machine learning algorithms for imbalanced data classification problem.
引用
收藏
页码:614 / 617
页数:4
相关论文
共 50 条
  • [21] Rough Entropy Based k-Means Clustering
    Malyszko, Dariusz
    Stepaniuk, Jaroslaw
    ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2009, 5908 : 406 - 413
  • [22] K-means Clustering with Feature Selection for Stream Data
    Wang, Xiao-dong
    Chen, Rung-Ching
    Yan, Fei
    Hendry
    2018 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C 2018), 2018, : 453 - 456
  • [23] Modified K-means Algorithm for Big Data Clustering
    Sengupta, Debapriya
    Roy, Sayantan Singha
    Ghosh, Sarbani
    Dasgupta, Ranjan
    PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1443 - 1448
  • [24] MRI Brain Tumor ImagesClassification Using K-Means Clustering, NSCT and SVM
    Saha, Chandan
    Hossain, Md. Foisal
    2017 4TH IEEE UTTAR PRADESH SECTION INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ELECTRONICS (UPCON), 2017, : 329 - 333
  • [25] Robustification of the k-means clustering problem and tailored decomposition methods: when more conservative means more accurate
    Burgard, Jan Pablo
    Costa, Carina Moreira
    Schmidt, Martin
    ANNALS OF OPERATIONS RESEARCH, 2024, 339 (03) : 1525 - 1568
  • [26] An effective ensemble framework for Many-Objective optimization based on AdaBoost and K-means clustering
    Palakonda, Vikas
    Kang, Jae-Mo
    Jung, Heechul
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 227
  • [27] On the Optimality of k-means Clustering
    Dalton, Lori A.
    2013 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS 2013), 2013, : 70 - 71
  • [28] Augmentation Method of Test Data for Path Coverage based on K-means Clustering
    Xie, Wei
    Xia, ChunYan
    Zhang, Yan
    Huo, TingTing
    Chen, Xiao
    2021 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C 2021), 2021, : 463 - 469
  • [29] A Support Vector and K-Means Based Hybrid Intelligent Data Clustering Algorithm
    Sun, Liang
    Yoshida, Shinichi
    Liang, Yanchun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (11) : 2234 - 2243
  • [30] Analysis of Interdriver Heterogeneity Based on Trajectory Data with K-means Clustering Method
    Zhu, Tailang
    Xie, Dongfan
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON SENSOR NETWORK AND COMPUTER ENGINEERING, 2016, 68 : 55 - 61