Imbalanced Data Over-Sampling Method Based on ISODATA Clustering

被引:0
|
作者
Lv, Zhenzhe [1 ]
Liu, Qicheng [1 ]
机构
[1] Yantai Univ, Sch Comp & Control Engn, Yantai 264000, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
imbalanced data; clustering; oversampling; ISODATA; SMOTE;
D O I
10.1587/transinf.2022EDP7190
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class imbalance is one of the challenges faced in the field of machine learning. It is difficult for traditional classifiers to predict the minority class data. If the imbalanced data is not processed, the effect of the classifier will be greatly reduced. Aiming at the problem that the traditional classifier tends to the majority class data and ignores the minority class data, imbalanced data over-sampling method based on iterative self-organizing data analysis technique algorithm(ISODATA) clustering is proposed. The minority class is divided into different sub-clusters by ISO DATA, and each sub-cluster is over-sampled according to the sampling ratio, so that the sampled minority class data also conforms to the imbalance of the original minority class data. The new imbalanced data composed of new minority class data and majority class data is classified by SVM and Random Forest classifier. Experiments on 12 datasets from the KEEL datasets show that the method has better G-means and F-value, improving the classification accuracy. counts the of to cancer tient sifies and
引用
收藏
页码:1528 / 1536
页数:9
相关论文
共 50 条
  • [21] Cluster-Based Minority Over-Sampling for Imbalanced Datasets
    Puntumapon, Kamthorn
    Rakthamamon, Thanawin
    Waiyamai, Kitsana
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3101 - 3109
  • [22] Diversity and Separable Metrics in Over-Sampling Technique for Imbalanced Data Classification
    Mahmoudi, Shadi
    Moradi, Parham
    Akhlaghian, Fardin
    Moradi, Rizan
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 152 - 158
  • [23] Learning from Imbalanced Data Using Over-Sampling and the Firefly Algorithm
    Czarnowski, Ireneusz
    COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 12876 : 373 - 386
  • [24] Abstention-SMOTE: An over-sampling approach for imbalanced data classification
    Zhang, Cheng
    Chen, Yufei
    Liu, Xianhui
    Zhao, Xiaodong
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2017), 2017, : 17 - 21
  • [25] RWO-Sampling: A random walk over-sampling approach to imbalanced data classification
    Zhang, Huaxiang
    Li, Mingfang
    INFORMATION FUSION, 2014, 20 : 99 - 116
  • [26] An Over-sampling Method Based on Probability Density Estimation for Imbalanced Datasets Classification
    Cao, Lu
    Zhai, Yi-Kui
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING (ICIIP'16), 2016,
  • [27] An Over-sampling Method Based on Margin Theory
    Zhang, Zongtang
    Chen, Zhe
    Dai, Weiguo
    Cheng, Yusheng
    ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 506 - 510
  • [28] Unbalanced data classification based on over-sampling and integrated learning
    Zhang, Yongjun
    Jian, Xiaowen
    2021 ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE (ACCTCS 2021), 2021, : 332 - 337
  • [29] Improving Diagnostic Performance of a Power Transformer Using an Adaptive Over-Sampling Method for Imbalanced Data
    Tra, Viet
    Bach-Phi Duong
    Kim, Jong-Myon
    IEEE TRANSACTIONS ON DIELECTRICS AND ELECTRICAL INSULATION, 2019, 26 (04) : 1325 - 1333
  • [30] Classifier Learning from Imbalanced Corpus by Autoencoded Over-Sampling
    Park, Eunkyung
    Wong, Raymond K.
    Chu, Victor W.
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2019, 11670 : 16 - 29