FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets

被引:1
|
作者
Maras, Abdullah [1 ]
Selcukcan Erol, Cigdem [1 ,2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkiye
[2] Istanbul Univ, Informat Dept, Istanbul, Turkiye
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkiye
关键词
Binary classification; imbalanced datasets; machine learning; sampling; fuzzy c-means; DATA-SETS; CLASSIFICATION; SMOTE; PREDICTION; ALGORITHM;
D O I
10.55730/1300-0632.4044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification model with imbalanced datasets is recently one of the most researched areas in machine learning applications since they induce to the emergence of low-performing machine learning models. The imbalanced datasets occur if target variables have an uneven number of examples in a dataset. The most prevalent solutions to imbalanced datasets can be categorized as data preprocessing, ensemble techniques, and cost-sensitive learning. In this article, we propose a new hybrid approach for binary classification, named FuzzyCSampling, which aims to increase model performance by ensembling fuzzy c-means clustering and data sampling solutions. This article compares the proposed approaches' results not only to the base model built on an imbalanced dataset but also to the previously presented stateof-the-art solutions undersampling, SMOTE oversampling, and Borderline Smote Oversampling. The model evaluation metrics for the comparison are accuracy, roc_auc score, precision, recall and F1-score. We evaluated the success of the brand-new proposed method on three different datasets having different imbalanced ratios and for three different machine learning algorithms (k-nearest neighbors algorithm, support vector machines and random forest). According to the experiments, FuzzyCSampling is an effective way to improve the model performance in the case of imbalanced datasets.
引用
收藏
页码:1223 / 1236
页数:15
相关论文
共 50 条
  • [31] Online Classifiers Based on Fuzzy C-means Clustering
    Jedrzejowicz, Joanna
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, 2013, 8083 : 427 - 436
  • [32] Particle swarm optimization for fuzzy c-means clustering
    Wang, Li
    Liu, Yushu
    Zhao, Xinxin
    Xu, Yuanqing
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 6055 - +
  • [33] Combined Generative Adversarial Network and Fuzzy C-Means Clustering for Multi-Class Voice Disorder Detection with an Imbalanced Dataset
    Chui, Kwok Tai
    Lytras, Miltiadis D.
    Vasant, Pandian
    APPLIED SCIENCES-BASEL, 2020, 10 (13):
  • [34] Fault Detection for Photovoltaic Systems Using Fuzzy C-Means Clustering
    Barbosa Jr, Jadir
    de Medeiros, Renan L. P.
    Ayres Jr, Florindo A. C.
    Chaves Filho, Joao Edgar
    Lucena Jr, Vicente F.
    Bessa, Iury
    2022 IEEE 27TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2022,
  • [35] An evolutionary approach to spatial fuzzy c-Means clustering
    Di Nola A.
    Loia V.
    Staiano A.
    Fuzzy Optimization and Decision Making, 2002, 1 (2) : 195 - 219
  • [36] A new validity index of fuzzy c-means clustering
    Zhang, Xin-bo
    Jiang, Li
    2009 INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS, VOL 2, PROCEEDINGS, 2009, : 218 - 221
  • [37] Weighted Intuitionistic Fuzzy C-Means Clustering Algorithms
    Kaushal, Meenakshi
    Lohani, Q. M. Danish
    Castillo, Oscar
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2024, 26 (03) : 943 - 977
  • [38] Kernelized fuzzy attribute C-means clustering algorithm
    Liu, Jingwei
    Xu, Meizhi
    FUZZY SETS AND SYSTEMS, 2008, 159 (18) : 2428 - 2445
  • [39] Medical Image Segmentation based on Improved Fuzzy C-means Clustering
    Liu, Dongling
    Ma, Ling
    Chen, Hui
    Meng, Ke
    2017 INTERNATIONAL CONFERENCE ON SMART GRID AND ELECTRICAL AUTOMATION (ICSGEA), 2017, : 406 - 410
  • [40] Fuzzy c-means clustering based on weights and gene expression programming
    Jiang, Zhaohui
    Li, Tingting
    Min, Wenfang
    Qi, Zhao
    Rao, Yuan
    PATTERN RECOGNITION LETTERS, 2017, 90 : 1 - 7