K-means Clustering based SVM Ensemble Methods for Imbalanced Data Problem

被引:0
|
作者
Lee, Jaedong [1 ]
Lee, Jee-Hyong [1 ]
机构
[1] Sungkyunkwan Univ, Dept Elect & Comp Engn, Suwon, South Korea
来源
2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS) | 2014年
关键词
imbalanced data; data membership; k-means clustering; SVM ensemble method;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When the number of data in one class is significantly larger or less than the data in other class, under machine learning algorithm for classification, a problem of learning generalization occurs to the specific class and this is called imbalanced data problem. In this paper, we propose a novel method to solve the imbalanced data problem. We first divide data into clusters using K-means clustering algorithm and create classifier using the Support Vector Machine (SVM) method on each cluster. Before making classifier for each cluster, we are balancing the data for each cluster using data sampling techniques. After all classifiers are made for each cluster, we validate each classifier's performance using validation data. Final classification result would be calculated using the test data by aggregating all the cluster's classification results. We are using not only the results from the classifiers in each clusters, but also the credit of each classifier and data membership to each cluster. We have verified that the proposed classification method shows better performance than the existing machine learning algorithms for imbalanced data classification problem.
引用
收藏
页码:614 / 617
页数:4
相关论文
共 50 条
  • [1] K-SVM: An Effective SVM Algorithm Based on K-means Clustering
    Yao, Yukai
    Liu, Yang
    Yu, Yongqing
    Xu, Hong
    Lv, Weiming
    Li, Zhao
    Chen, Xiaoyun
    JOURNAL OF COMPUTERS, 2013, 8 (10) : 2632 - 2639
  • [2] SVM Venn machine with k-means clustering
    Zhou, Chenzhe
    Nouretdinov, Ilia
    Luo, Zhiyuan
    Gammerman, Alex
    IFIP Advances in Information and Communication Technology, 2014, 437 : 251 - 260
  • [3] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [4] Authentication of uncertain data based on k-means clustering
    Unver, Levent
    Gundem, Taflan I.
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2016, 24 (04) : 2910 - 2928
  • [5] K-Means Clustering With Incomplete Data
    Wang, Siwei
    Li, Miaomiao
    Hu, Ning
    Zhu, En
    Hu, Jingtao
    Liu, Xinwang
    Yin, Jianping
    IEEE ACCESS, 2019, 7 : 69162 - 69171
  • [6] A fast SVM training algorithm based on the set segmentation and k-means clustering
    YANG Xiaowei 1
    2. Centre for ACES
    3. College of Computer Science and Technology
    ProgressinNaturalScience, 2003, (10) : 30 - 35
  • [7] A fast SVM training algorithm based on the set segmentation and k-means clustering
    Yang, XW
    Lin, DY
    Hao, ZF
    Liang, YC
    Liu, GR
    Han, X
    PROGRESS IN NATURAL SCIENCE-MATERIALS INTERNATIONAL, 2003, 13 (10) : 750 - 755
  • [8] Imbalanced data optimization combining K-means and SMOTE
    Li W.
    International Journal of Performability Engineering, 2019, 15 (08): : 2173 - 2181
  • [9] Vegetable Disease Detection Using K-Means Clustering And Svm
    Rahamathunnisa, U.
    Nallakaruppan, M. K.
    Anith, A.
    Kumar, K. S. Sendhil
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 1308 - 1311
  • [10] Adapting K-Means Algorithm for Pair-Wise Constrained Clustering of Imbalanced Data Streams
    Wojciechowski, Szymon
    Gonzalez-Almagro, German
    Garcia, Salvador
    Wozniak, Michal
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2022, 2022, 13469 : 153 - 163