FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets

被引:1
|
作者
Maras, Abdullah [1 ]
Selcukcan Erol, Cigdem [1 ,2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkiye
[2] Istanbul Univ, Informat Dept, Istanbul, Turkiye
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkiye
关键词
Binary classification; imbalanced datasets; machine learning; sampling; fuzzy c-means; DATA-SETS; CLASSIFICATION; SMOTE; PREDICTION; ALGORITHM;
D O I
10.55730/1300-0632.4044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification model with imbalanced datasets is recently one of the most researched areas in machine learning applications since they induce to the emergence of low-performing machine learning models. The imbalanced datasets occur if target variables have an uneven number of examples in a dataset. The most prevalent solutions to imbalanced datasets can be categorized as data preprocessing, ensemble techniques, and cost-sensitive learning. In this article, we propose a new hybrid approach for binary classification, named FuzzyCSampling, which aims to increase model performance by ensembling fuzzy c-means clustering and data sampling solutions. This article compares the proposed approaches' results not only to the base model built on an imbalanced dataset but also to the previously presented stateof-the-art solutions undersampling, SMOTE oversampling, and Borderline Smote Oversampling. The model evaluation metrics for the comparison are accuracy, roc_auc score, precision, recall and F1-score. We evaluated the success of the brand-new proposed method on three different datasets having different imbalanced ratios and for three different machine learning algorithms (k-nearest neighbors algorithm, support vector machines and random forest). According to the experiments, FuzzyCSampling is an effective way to improve the model performance in the case of imbalanced datasets.
引用
收藏
页码:1223 / 1236
页数:15
相关论文
共 50 条
  • [1] A novel intuitionistic fuzzy rough instance selection and attribute reduction with kernelized intuitionistic fuzzy C-means clustering to handle imbalanced datasets
    Tiwari, Anoop Kumar
    Nath, Abhigyan
    Pandey, Rakesh Kumar
    Maratha, Priti
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
  • [2] Synthetic Minority Over-Sampling Technique based on Fuzzy C-means Clustering for Imbalanced Data
    Lee, Hansoo
    Jung, Seunghyan
    Kim, Minseok
    Kimt, Sungshin
    2017 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2017,
  • [3] Information Theoretical Importance Sampling Clustering and Its Relationship With Fuzzy C-Means
    Zhang, Jiangshe
    Ji, Lizhen
    Wang, Meng
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2024, 32 (04) : 2164 - 2175
  • [4] Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization
    Silva Filho, Telmo M.
    Pimentel, Bruno A.
    Souza, Renata M. C. R.
    Oliveira, Adriano L. I.
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (17-18) : 6315 - 6328
  • [5] Fuzzy C-Means and Fuzzy TLBO for Fuzzy Clustering
    Krishna, P. Gopala
    Bhaskari, D. Lalitha
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 1, 2016, 379 : 479 - 486
  • [6] Projected fuzzy C-means clustering with locality preservation
    Zhou, Jie
    Pedrycz, Witold
    Yue, Xiaodong
    Gao, Can
    Lai, Zhihui
    Wan, Jun
    PATTERN RECOGNITION, 2021, 113
  • [7] An Improved Fuzzy C-means Clustering Algorithm
    Duan, Lingzi
    Yu, Fusheng
    Zhan, Li
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1199 - 1204
  • [8] On Fuzzy c-Means and Membership Based Clustering
    Torra, Vicenc
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, PT I (IWANN 2015), 2015, 9094 : 597 - 607
  • [9] Relative entropy fuzzy c-means clustering
    Zarinbal, M.
    Zarandi, M. H. Fazel
    Turksen, I. B.
    INFORMATION SCIENCES, 2014, 260 : 74 - 97
  • [10] Diverse fuzzy c-means for image clustering
    Zhang, Lingling
    Luo, Minnan
    Liu, Jun
    Li, Zhihui
    Zheng, Qinghua
    PATTERN RECOGNITION LETTERS, 2020, 130 (130) : 275 - 283