FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets

被引：1

作者：

Maras, Abdullah ^{[1
]}

Selcukcan Erol, Cigdem ^{[1
,2
,3
]}

机构：

[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkiye

[2] Istanbul Univ, Informat Dept, Istanbul, Turkiye

[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkiye

来源：

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES | 2023年 / 31卷 / 07期

关键词：

Binary classification; imbalanced datasets; machine learning; sampling; fuzzy c-means; DATA-SETS; CLASSIFICATION; SMOTE; PREDICTION; ALGORITHM;

D O I：

10.55730/1300-0632.4044

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Classification model with imbalanced datasets is recently one of the most researched areas in machine learning applications since they induce to the emergence of low-performing machine learning models. The imbalanced datasets occur if target variables have an uneven number of examples in a dataset. The most prevalent solutions to imbalanced datasets can be categorized as data preprocessing, ensemble techniques, and cost-sensitive learning. In this article, we propose a new hybrid approach for binary classification, named FuzzyCSampling, which aims to increase model performance by ensembling fuzzy c-means clustering and data sampling solutions. This article compares the proposed approaches' results not only to the base model built on an imbalanced dataset but also to the previously presented stateof-the-art solutions undersampling, SMOTE oversampling, and Borderline Smote Oversampling. The model evaluation metrics for the comparison are accuracy, roc_auc score, precision, recall and F1-score. We evaluated the success of the brand-new proposed method on three different datasets having different imbalanced ratios and for three different machine learning algorithms (k-nearest neighbors algorithm, support vector machines and random forest). According to the experiments, FuzzyCSampling is an effective way to improve the model performance in the case of imbalanced datasets.

引用

页码：1223 / 1236

页数：15

共 50 条

[21] Measuring the congruence of fuzzy partitions in fuzzy c-means clustering
Suleman, Abdul
APPLIED SOFT COMPUTING, 2017, 52 : 1285 - 1295
[22] Hybrid K-means, fuzzy C-means, and hierarchical clustering for DNA hepatitis C virus trend mutation analysis
Al Kindhi, Berlian
Sardjono, Tri Arief
Purnomo, Mauridhi Hery
Verkerke, Gijbertus Jacob
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 121 : 373 - 381
[23] Suppressed possibilistic fuzzy c-means clustering based on shadow sets for noisy data with imbalanced sizes
Yu, Haiyan
Li, Honglei
Xu, Xiaoyu
Gao, Qian
Lan, Rong
APPLIED SOFT COMPUTING, 2024, 167
[24] Sparse learning based fuzzy c-means clustering
Gu, Jing
Jiao, Licheng
Yang, Shuyuan
Zhao, Jiaqi
KNOWLEDGE-BASED SYSTEMS, 2017, 119 : 113 - 125
[25] Fuzzy C-Means Clustering for Motion Capture Tennis Time-Series Data
Skublewska-Paszkowska, Maria
Powroznik, Pawel
Karczmarek, Pawel
Lukasik, Edyta
Smolka, Jakub
IEEE ACCESS, 2024, 12 : 150975 - 150996
[26] Fuzzy c-means clustering for power system coherence
Wang, SC
Huang, PH
INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOL 1-4, PROCEEDINGS, 2005, : 2850 - 2855
[27] Improvement and optimization of a Fuzzy C-Means clustering algorithm
Shen, Y
Shi, H
Zhang, JQ
IMTC/2001: PROCEEDINGS OF THE 18TH IEEE INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, VOLS 1-3: REDISCOVERING MEASUREMENT IN THE AGE OF INFORMATICS, 2001, : 1430 - 1433
[28] Analytically tractable case of fuzzy c-means clustering
Pianykh, OS
PATTERN RECOGNITION, 2006, 39 (01) : 35 - 46
[29] A review on suppressed fuzzy c-means clustering models
Szilagyi, Laszlo
Lefkovits, Laszlo
Iclanzan, David
ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2020, 12 (02) : 302 - 324
[30] FRCM: A fuzzy rough c-means clustering method
Yu, Bin
Zheng, Zijian
Cai, Mingjie
Pedrycz, Witold
Ding, Weiping
FUZZY SETS AND SYSTEMS, 2024, 480

← 1 2 3 4 5 →