Imbalanced Data Over-Sampling Method Based on ISODATA Clustering

被引:0
|
作者
Lv, Zhenzhe [1 ]
Liu, Qicheng [1 ]
机构
[1] Yantai Univ, Sch Comp & Control Engn, Yantai 264000, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
imbalanced data; clustering; oversampling; ISODATA; SMOTE;
D O I
10.1587/transinf.2022EDP7190
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class imbalance is one of the challenges faced in the field of machine learning. It is difficult for traditional classifiers to predict the minority class data. If the imbalanced data is not processed, the effect of the classifier will be greatly reduced. Aiming at the problem that the traditional classifier tends to the majority class data and ignores the minority class data, imbalanced data over-sampling method based on iterative self-organizing data analysis technique algorithm(ISODATA) clustering is proposed. The minority class is divided into different sub-clusters by ISO DATA, and each sub-cluster is over-sampled according to the sampling ratio, so that the sampled minority class data also conforms to the imbalance of the original minority class data. The new imbalanced data composed of new minority class data and majority class data is classified by SVM and Random Forest classifier. Experiments on 12 datasets from the KEEL datasets show that the method has better G-means and F-value, improving the classification accuracy. counts the of to cancer tient sifies and
引用
收藏
页码:1528 / 1536
页数:9
相关论文
共 50 条
  • [41] Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis
    Huang, Zhaoke
    Yang, Chunhua
    Chen, Xiaofang
    Huang, Keke
    Xie, Yongfang
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 7183 - 7199
  • [42] Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis
    Zhaoke Huang
    Chunhua Yang
    Xiaofang Chen
    Keke Huang
    Yongfang Xie
    Neural Computing and Applications, 2020, 32 : 7183 - 7199
  • [43] An Over-Sampling Technique with Rejection for Imbalanced Class Learning
    Lee, Jaedong
    Kim, Noo-ri
    Lee, Jee-Hyong
    ACM IMCOM 2015, PROCEEDINGS, 2015,
  • [44] An Over-sampling Method Based on Margin Theory
    Zhang, Zongtang
    Chen, Zhe
    Dai, Weiguo
    Cheng, Yusheng
    ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 506 - 510
  • [45] Over-sampling imbalanced datasets using the covariance matrix
    Leguen-de Varona, Ireimis
    Madera, Julio
    Martínez-López, Yoan
    Hernández-Nieto, José Carlos
    EAI Endorsed Transactions on Energy Web, 2020, 7 (27) : 1 - 6
  • [46] A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data
    Ren, Weishuo
    Zheng, Yifeng
    Zhang, Wenjie
    Qing, Depeng
    Zeng, Xianlong
    Li, Guohe
    NEUROCOMPUTING, 2025, 612
  • [47] SNEOM: A Sanger Network Based Extended Over-Sampling Method. Application to Imbalanced Biomedical Datasets
    Manuel Martinez-Garcia, Jose
    Paz Suarez-Araujo, Carmen
    Garcia Baez, Patricio
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT IV, 2012, 7666 : 584 - 592
  • [48] An overlapping minimization-based over-sampling algorithm for binary imbalanced classification
    Lu, Xuan
    Ye, Xuan
    Cheng, Yingchao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [49] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
    El-Sayed, Asmaa Ahmed
    Meguid, Nagwa Abdel
    Mahmood, Mahmood Abdel Manem
    Hefny, Hesham Ahmed
    PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [50] A priori synthetic over-sampling methods for increasing classification sensitivity in imbalanced data sets
    Rivera, William A.
    Xanthopoulos, Petros
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 66 : 124 - 135