Imbalanced Data Over-Sampling Method Based on ISODATA Clustering

被引:0
|
作者
Lv, Zhenzhe [1 ]
Liu, Qicheng [1 ]
机构
[1] Yantai Univ, Sch Comp & Control Engn, Yantai 264000, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
imbalanced data; clustering; oversampling; ISODATA; SMOTE;
D O I
10.1587/transinf.2022EDP7190
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class imbalance is one of the challenges faced in the field of machine learning. It is difficult for traditional classifiers to predict the minority class data. If the imbalanced data is not processed, the effect of the classifier will be greatly reduced. Aiming at the problem that the traditional classifier tends to the majority class data and ignores the minority class data, imbalanced data over-sampling method based on iterative self-organizing data analysis technique algorithm(ISODATA) clustering is proposed. The minority class is divided into different sub-clusters by ISO DATA, and each sub-cluster is over-sampled according to the sampling ratio, so that the sampled minority class data also conforms to the imbalance of the original minority class data. The new imbalanced data composed of new minority class data and majority class data is classified by SVM and Random Forest classifier. Experiments on 12 datasets from the KEEL datasets show that the method has better G-means and F-value, improving the classification accuracy. counts the of to cancer tient sifies and
引用
收藏
页码:1528 / 1536
页数:9
相关论文
共 50 条
  • [41] Multi-view feature fusion and density-based minority over-sampling technique for amyloid protein prediction under imbalanced data
    Yang, Runtao
    Liu, Jiaming
    Zhang, Qian
    Zhang, Lina
    APPLIED SOFT COMPUTING, 2024, 150
  • [42] SNEOM: A Sanger Network Based Extended Over-Sampling Method. Application to Imbalanced Biomedical Datasets
    Manuel Martinez-Garcia, Jose
    Paz Suarez-Araujo, Carmen
    Garcia Baez, Patricio
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT IV, 2012, 7666 : 584 - 592
  • [43] A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data
    Ren, Weishuo
    Zheng, Yifeng
    Zhang, Wenjie
    Qing, Depeng
    Zeng, Xianlong
    Li, Guohe
    NEUROCOMPUTING, 2025, 612
  • [44] An overlapping minimization-based over-sampling algorithm for binary imbalanced classification
    Lu, Xuan
    Ye, Xuan
    Cheng, Yingchao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [45] An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    ANALYTICA CHIMICA ACTA, 2014, 806 : 117 - 127
  • [46] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
    El-Sayed, Asmaa Ahmed
    Meguid, Nagwa Abdel
    Mahmood, Mahmood Abdel Manem
    Hefny, Hesham Ahmed
    PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [47] A priori synthetic over-sampling methods for increasing classification sensitivity in imbalanced data sets
    Rivera, William A.
    Xanthopoulos, Petros
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 66 : 124 - 135
  • [48] Real-value negative selection over-sampling for imbalanced data set learning
    Tao, Xinmin
    Li, Qing
    Ren, Chao
    Guo, Wenjie
    Li, Chenxi
    He, Qing
    Liu, Rui
    Zou, Junrong
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 129 : 118 - 134
  • [49] An Over Sampling Method of Unbalanced Data Based on Ant Colony Clustering
    Gao Yang
    Liu Qicheng
    IEEE ACCESS, 2021, 9 : 130990 - 130996
  • [50] Imbalanced Data Classification Based on Clustering
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 741 - 745