FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets

被引:1
|
作者
Maras, Abdullah [1 ]
Selcukcan Erol, Cigdem [1 ,2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkiye
[2] Istanbul Univ, Informat Dept, Istanbul, Turkiye
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkiye
关键词
Binary classification; imbalanced datasets; machine learning; sampling; fuzzy c-means; DATA-SETS; CLASSIFICATION; SMOTE; PREDICTION; ALGORITHM;
D O I
10.55730/1300-0632.4044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification model with imbalanced datasets is recently one of the most researched areas in machine learning applications since they induce to the emergence of low-performing machine learning models. The imbalanced datasets occur if target variables have an uneven number of examples in a dataset. The most prevalent solutions to imbalanced datasets can be categorized as data preprocessing, ensemble techniques, and cost-sensitive learning. In this article, we propose a new hybrid approach for binary classification, named FuzzyCSampling, which aims to increase model performance by ensembling fuzzy c-means clustering and data sampling solutions. This article compares the proposed approaches' results not only to the base model built on an imbalanced dataset but also to the previously presented stateof-the-art solutions undersampling, SMOTE oversampling, and Borderline Smote Oversampling. The model evaluation metrics for the comparison are accuracy, roc_auc score, precision, recall and F1-score. We evaluated the success of the brand-new proposed method on three different datasets having different imbalanced ratios and for three different machine learning algorithms (k-nearest neighbors algorithm, support vector machines and random forest). According to the experiments, FuzzyCSampling is an effective way to improve the model performance in the case of imbalanced datasets.
引用
收藏
页码:1223 / 1236
页数:15
相关论文
共 50 条
  • [41] A Hybrid Fuzzy C-Means Clustering-AHP Framework to Select Construction Contractors
    Elbarkouky, Mohamed M. G.
    El-Deep, Ahmed Mohamed
    Marzouk, Mohamed M.
    PROCEEDINGS OF THE 2013 JOINT IFSA WORLD CONGRESS AND NAFIPS ANNUAL MEETING (IFSA/NAFIPS), 2013, : 1166 - 1171
  • [42] A Hybrid Clustering Algorithm Based on Fuzzy c-Means and Improved Particle Swarm Optimization
    Chen, Shouwen
    Xu, Zhuoming
    Tang, Yan
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2014, 39 (12) : 8875 - 8887
  • [43] A novel approach to fuzzy c-Means clustering using kernel function
    Kochuveettil, Ani Davis
    Mathew, Raj
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2022, 16 (04): : 643 - 651
  • [44] Application of Fuzzy C-Means clustering for seed discrimination by artificial vision
    Chtioui, Y
    Bertrand, D
    Barba, D
    Dattee, Y
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1997, 38 (01) : 75 - 87
  • [45] The Hybrid of Kernel K-Means and Fuzzy Kernel C-Means Clustering Algorithm in Diagnosing Thalassemia
    Rustam, Zuherman
    Hartini, Sri
    Saragih, Glori S.
    Darmawan, Nurlia A.
    Aurelia, Jane E.
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 494 - 505
  • [46] Fuzzy C-Means Clustering via Slime Mold and the Fisher Score
    Zhang, Yiman
    Sun, Lin
    Chang, Baofang
    Zhang, Qianqian
    Xu, Jiucheng
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2025, 27 (02) : 606 - 628
  • [47] UNSUPERVISED FUZZY C-MEANS CLUSTERING FOR MOTOR IMAGERY EEG RECOGNITION
    Hsu, Wei-Yen
    Lin, Chi-Yuan
    Kuo, Wen-Feng
    Liou, Michelle
    Sun, Yung-Nien
    Tsai, Arthur Chih-Hsin
    Hsu, Hsien-Jen
    Chen, Po-Hsun
    Chen, I-Ru
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2011, 7 (08): : 4965 - 4976
  • [48] k-means and fuzzy c-means fusion for object clustering
    Heni, Ashraf
    Jdey, Imen
    Ltifi, Hela
    2022 8TH INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT'22), 2022, : 177 - 182
  • [49] The MinMax Fuzzy C-Means
    Mashayekhi, Yoosof
    Nazerfard, Ehsan
    Rahbar, Arman
    Mahmood, Samira Shirzadeh Haji
    2019 IEEE 5TH CONFERENCE ON KNOWLEDGE BASED ENGINEERING AND INNOVATION (KBEI 2019), 2019, : 210 - 215
  • [50] A new ECG beat clustering method based on kernelized fuzzy c-means and hybrid ant colony optimization for continuous domains
    Dogan, Berat
    Korurek, Mehmet
    APPLIED SOFT COMPUTING, 2012, 12 (11) : 3442 - 3451