K-means Clustering based SVM Ensemble Methods for Imbalanced Data Problem

被引:0
|
作者
Lee, Jaedong [1 ]
Lee, Jee-Hyong [1 ]
机构
[1] Sungkyunkwan Univ, Dept Elect & Comp Engn, Suwon, South Korea
来源
2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS) | 2014年
关键词
imbalanced data; data membership; k-means clustering; SVM ensemble method;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When the number of data in one class is significantly larger or less than the data in other class, under machine learning algorithm for classification, a problem of learning generalization occurs to the specific class and this is called imbalanced data problem. In this paper, we propose a novel method to solve the imbalanced data problem. We first divide data into clusters using K-means clustering algorithm and create classifier using the Support Vector Machine (SVM) method on each cluster. Before making classifier for each cluster, we are balancing the data for each cluster using data sampling techniques. After all classifiers are made for each cluster, we validate each classifier's performance using validation data. Final classification result would be calculated using the test data by aggregating all the cluster's classification results. We are using not only the results from the classifiers in each clusters, but also the credit of each classifier and data membership to each cluster. We have verified that the proposed classification method shows better performance than the existing machine learning algorithms for imbalanced data classification problem.
引用
收藏
页码:614 / 617
页数:4
相关论文
共 50 条
  • [31] k-means clustering of extremes
    Janssen, Anja
    Wan, Phyllis
    ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (01): : 1211 - 1233
  • [32] An Empirical comparison of Clustering using Hierarchical methods and K-means
    Praveen, P.
    Rama, B.
    PROCEEDINGS OF THE 2016 IEEE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL & ELECTRONICS, INFORMATION, COMMUNICATION & BIO INFORMATICS (IEEE AEEICB-2016), 2016, : 445 - 449
  • [33] K-Means and Alternative Clustering Methods in Modern Power Systems
    Miraftabzadeh, Seyed Mahdi
    Colombo, Cristian Giovanni
    Longo, Michela
    Foiadelli, Federica
    IEEE ACCESS, 2023, 11 : 119596 - 119633
  • [34] Radar Signal Sorting Algorithm of K-Means Clustering based on Data Field
    Feng, Xin
    Hu, Xiaoxi
    Liu, Yang
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2262 - 2266
  • [35] Email Forensic Analysis Based on k-means clustering
    Nampoothiri, Arya P.
    Madhavu, Minu Lalitha
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 814 - 817
  • [36] Weighted k-Means Algorithm Based Text Clustering
    Chen, Xiuguo
    Yin, Wensheng
    Tu, Pinghui
    Zhang, Hengxi
    IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 51 - +
  • [37] Using K-Means Clustering and Data Visualization for Monetizing logistics Data
    Qabbaah, Hamzah
    Sammour, George
    Vanhoof, Koen
    2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 164 - 169
  • [38] Time series k-means: A new k-means type smooth subspace clustering for time series data
    Huang, Xiaohui
    Ye, Yunming
    Xiong, Liyan
    Lau, Raymond Y. K.
    Jiang, Nan
    Wang, Shaokai
    INFORMATION SCIENCES, 2016, 367 : 1 - 13
  • [39] NEW ALGORITHM FOR CLUSTERING DISTRIBUTED DATA USING K-MEANS
    Khedr, Ahmed M.
    Bhatnagar, Raj K.
    COMPUTING AND INFORMATICS, 2014, 33 (04) : 943 - 964
  • [40] Underdetermined BSS Based on K-means and AP Clustering
    He, Xuan-sen
    He, Fan
    Cai, Wei-hua
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2016, 35 (08) : 2881 - 2913