K-means Clustering based SVM Ensemble Methods for Imbalanced Data Problem

被引:0
|
作者
Lee, Jaedong [1 ]
Lee, Jee-Hyong [1 ]
机构
[1] Sungkyunkwan Univ, Dept Elect & Comp Engn, Suwon, South Korea
来源
2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS) | 2014年
关键词
imbalanced data; data membership; k-means clustering; SVM ensemble method;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When the number of data in one class is significantly larger or less than the data in other class, under machine learning algorithm for classification, a problem of learning generalization occurs to the specific class and this is called imbalanced data problem. In this paper, we propose a novel method to solve the imbalanced data problem. We first divide data into clusters using K-means clustering algorithm and create classifier using the Support Vector Machine (SVM) method on each cluster. Before making classifier for each cluster, we are balancing the data for each cluster using data sampling techniques. After all classifiers are made for each cluster, we validate each classifier's performance using validation data. Final classification result would be calculated using the test data by aggregating all the cluster's classification results. We are using not only the results from the classifiers in each clusters, but also the credit of each classifier and data membership to each cluster. We have verified that the proposed classification method shows better performance than the existing machine learning algorithms for imbalanced data classification problem.
引用
收藏
页码:614 / 617
页数:4
相关论文
共 50 条
  • [41] Clustering Data in Power Management System Using k-Means Clustering Algorithm
    Aryani, Ressy
    Nasrun, Muhammad
    Setianingsih, Casi
    Murti, Muhammad Ary
    2019 IEEE ASIA PACIFIC CONFERENCE ON WIRELESS AND MOBILE (APWIMOB), 2019, : 164 - 170
  • [42] AN INITIALIZATION METHOD OF K-MEANS CLUSTERING ALGORITHM FOR MIXED DATA
    Li, Taoying
    Jin, Zhihong
    Chen, Yan
    Ebonzo, Angelo Dan Menga
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2014, 10 (05): : 1873 - 1883
  • [43] PCA-guided k-Means Clustering With Incomplete Data
    Honda, Katsuhiro
    Nonoguchi, Ryoichi
    Notsu, Akira
    Ichihashi, Hidetomo
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1710 - 1714
  • [44] K-means Bayes algorithm for imbalanced fault classification and big data application
    Chen, Gecheng
    Liu, Yue
    Ge, Zhiqiang
    JOURNAL OF PROCESS CONTROL, 2019, 81 : 54 - 64
  • [45] Indian Language Identification Using K-Means Clustering and Support Vector Machine (SVM)
    Verma, Vicky Kumar
    Khanna, Nitin
    2013 STUDENTS CONFERENCE ON ENGINEERING AND SYSTEMS (SCES): INSPIRING ENGINEERING AND SYSTEMS FOR SUSTAINABLE DEVELOPMENT, 2013,
  • [46] Modified K-Means Clustering for Travel Time Prediction Based on Historical Traffic Data
    Nath, Rudra Pratap Deb
    Lee, Hyun-Jo
    Chowdhury, Nihad Karim
    Chang, Jae-Woo
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT I, 2010, 6276 : 511 - +
  • [47] Research on Error Calibration Method for Power Big Data Based on K-Means Clustering
    Xing, Wei
    Wu, Botao
    Liang, Mingyuan
    Li, Yue
    Cheng, Lin
    2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 679 - 682
  • [48] Cascading k-means with Ensemble Learning: Enhanced Categorization of Diabetic Data
    Karegowda, Asha Gowda
    Jayaram, M. A.
    Manjunath, A. S.
    JOURNAL OF INTELLIGENT SYSTEMS, 2012, 21 (03) : 237 - 253
  • [49] Motif-Based Method for Initialization the K-Means Clustering for Time Series Data
    Le Phu
    Duong Tuan Anh
    AI 2011: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7106 : 11 - 20
  • [50] k-means clustering for persistent homology
    Cao, Yueqi
    Leung, Prudence
    Monod, Anthea
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2025, 19 (01) : 95 - 119