FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets

被引:1
|
作者
Maras, Abdullah [1 ]
Selcukcan Erol, Cigdem [1 ,2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkiye
[2] Istanbul Univ, Informat Dept, Istanbul, Turkiye
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkiye
关键词
Binary classification; imbalanced datasets; machine learning; sampling; fuzzy c-means; DATA-SETS; CLASSIFICATION; SMOTE; PREDICTION; ALGORITHM;
D O I
10.55730/1300-0632.4044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification model with imbalanced datasets is recently one of the most researched areas in machine learning applications since they induce to the emergence of low-performing machine learning models. The imbalanced datasets occur if target variables have an uneven number of examples in a dataset. The most prevalent solutions to imbalanced datasets can be categorized as data preprocessing, ensemble techniques, and cost-sensitive learning. In this article, we propose a new hybrid approach for binary classification, named FuzzyCSampling, which aims to increase model performance by ensembling fuzzy c-means clustering and data sampling solutions. This article compares the proposed approaches' results not only to the base model built on an imbalanced dataset but also to the previously presented stateof-the-art solutions undersampling, SMOTE oversampling, and Borderline Smote Oversampling. The model evaluation metrics for the comparison are accuracy, roc_auc score, precision, recall and F1-score. We evaluated the success of the brand-new proposed method on three different datasets having different imbalanced ratios and for three different machine learning algorithms (k-nearest neighbors algorithm, support vector machines and random forest). According to the experiments, FuzzyCSampling is an effective way to improve the model performance in the case of imbalanced datasets.
引用
收藏
页码:1223 / 1236
页数:15
相关论文
共 50 条
  • [21] Measuring the congruence of fuzzy partitions in fuzzy c-means clustering
    Suleman, Abdul
    APPLIED SOFT COMPUTING, 2017, 52 : 1285 - 1295
  • [22] Hybrid K-means, fuzzy C-means, and hierarchical clustering for DNA hepatitis C virus trend mutation analysis
    Al Kindhi, Berlian
    Sardjono, Tri Arief
    Purnomo, Mauridhi Hery
    Verkerke, Gijbertus Jacob
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 121 : 373 - 381
  • [23] Suppressed possibilistic fuzzy c-means clustering based on shadow sets for noisy data with imbalanced sizes
    Yu, Haiyan
    Li, Honglei
    Xu, Xiaoyu
    Gao, Qian
    Lan, Rong
    APPLIED SOFT COMPUTING, 2024, 167
  • [24] Sparse learning based fuzzy c-means clustering
    Gu, Jing
    Jiao, Licheng
    Yang, Shuyuan
    Zhao, Jiaqi
    KNOWLEDGE-BASED SYSTEMS, 2017, 119 : 113 - 125
  • [25] Fuzzy C-Means Clustering for Motion Capture Tennis Time-Series Data
    Skublewska-Paszkowska, Maria
    Powroznik, Pawel
    Karczmarek, Pawel
    Lukasik, Edyta
    Smolka, Jakub
    IEEE ACCESS, 2024, 12 : 150975 - 150996
  • [26] Fuzzy c-means clustering for power system coherence
    Wang, SC
    Huang, PH
    INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOL 1-4, PROCEEDINGS, 2005, : 2850 - 2855
  • [27] Improvement and optimization of a Fuzzy C-Means clustering algorithm
    Shen, Y
    Shi, H
    Zhang, JQ
    IMTC/2001: PROCEEDINGS OF THE 18TH IEEE INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, VOLS 1-3: REDISCOVERING MEASUREMENT IN THE AGE OF INFORMATICS, 2001, : 1430 - 1433
  • [28] Analytically tractable case of fuzzy c-means clustering
    Pianykh, OS
    PATTERN RECOGNITION, 2006, 39 (01) : 35 - 46
  • [29] A review on suppressed fuzzy c-means clustering models
    Szilagyi, Laszlo
    Lefkovits, Laszlo
    Iclanzan, David
    ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2020, 12 (02) : 302 - 324
  • [30] FRCM: A fuzzy rough c-means clustering method
    Yu, Bin
    Zheng, Zijian
    Cai, Mingjie
    Pedrycz, Witold
    Ding, Weiping
    FUZZY SETS AND SYSTEMS, 2024, 480