K-means Clustering based SVM Ensemble Methods for Imbalanced Data Problem

被引：0

作者：

Lee, Jaedong ^{[1
]}

Lee, Jee-Hyong ^{[1
]}

机构：

[1] Sungkyunkwan Univ, Dept Elect & Comp Engn, Suwon, South Korea

来源：

2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS) | 2014年

关键词：

imbalanced data; data membership; k-means clustering; SVM ensemble method;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

When the number of data in one class is significantly larger or less than the data in other class, under machine learning algorithm for classification, a problem of learning generalization occurs to the specific class and this is called imbalanced data problem. In this paper, we propose a novel method to solve the imbalanced data problem. We first divide data into clusters using K-means clustering algorithm and create classifier using the Support Vector Machine (SVM) method on each cluster. Before making classifier for each cluster, we are balancing the data for each cluster using data sampling techniques. After all classifiers are made for each cluster, we validate each classifier's performance using validation data. Final classification result would be calculated using the test data by aggregating all the cluster's classification results. We are using not only the results from the classifiers in each clusters, but also the credit of each classifier and data membership to each cluster. We have verified that the proposed classification method shows better performance than the existing machine learning algorithms for imbalanced data classification problem.

引用

页码：614 / 617

页数：4

共 50 条

[41] Clustering Data in Power Management System Using k-Means Clustering Algorithm
Aryani, Ressy
Nasrun, Muhammad
Setianingsih, Casi
Murti, Muhammad Ary
2019 IEEE ASIA PACIFIC CONFERENCE ON WIRELESS AND MOBILE (APWIMOB), 2019, : 164 - 170
[42] AN INITIALIZATION METHOD OF K-MEANS CLUSTERING ALGORITHM FOR MIXED DATA
Li, Taoying
Jin, Zhihong
Chen, Yan
Ebonzo, Angelo Dan Menga
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2014, 10 (05): : 1873 - 1883
[43] PCA-guided k-Means Clustering With Incomplete Data
Honda, Katsuhiro
Nonoguchi, Ryoichi
Notsu, Akira
Ichihashi, Hidetomo
IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1710 - 1714
[44] K-means Bayes algorithm for imbalanced fault classification and big data application
Chen, Gecheng
Liu, Yue
Ge, Zhiqiang
JOURNAL OF PROCESS CONTROL, 2019, 81 : 54 - 64
[45] Indian Language Identification Using K-Means Clustering and Support Vector Machine (SVM)
Verma, Vicky Kumar
Khanna, Nitin
2013 STUDENTS CONFERENCE ON ENGINEERING AND SYSTEMS (SCES): INSPIRING ENGINEERING AND SYSTEMS FOR SUSTAINABLE DEVELOPMENT, 2013,
[46] Modified K-Means Clustering for Travel Time Prediction Based on Historical Traffic Data
Nath, Rudra Pratap Deb
Lee, Hyun-Jo
Chowdhury, Nihad Karim
Chang, Jae-Woo
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT I, 2010, 6276 : 511 - +
[47] Research on Error Calibration Method for Power Big Data Based on K-Means Clustering
Xing, Wei
Wu, Botao
Liang, Mingyuan
Li, Yue
Cheng, Lin
2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 679 - 682
[48] Cascading k-means with Ensemble Learning: Enhanced Categorization of Diabetic Data
Karegowda, Asha Gowda
Jayaram, M. A.
Manjunath, A. S.
JOURNAL OF INTELLIGENT SYSTEMS, 2012, 21 (03) : 237 - 253
[49] Motif-Based Method for Initialization the K-Means Clustering for Time Series Data
Le Phu
Duong Tuan Anh
AI 2011: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7106 : 11 - 20
[50] k-means clustering for persistent homology
Cao, Yueqi
Leung, Prudence
Monod, Anthea
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2025, 19 (01) : 95 - 119

← 1 2 3 4 5 →