A new Monte Carlo sampling method based on Gaussian Mixture Model for imbalanced data classification

被引:0
|
作者
Chen, Gang [1 ]
Hou, Binjie [1 ]
Lei, Tiangang [1 ]
机构
[1] Dalian Maritime Univ, Dept Math, Dalian 116026, Peoples R China
关键词
imbalanced data; Monte Carlo sampling; probability density function; oversampling technique; Gaussian Mixture Model; SMOTE; SUPPORT;
D O I
10.3934/mbe.2023794
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Imbalanced data classification has been a major topic in the machine learning community. Different approaches can be taken to solve the issue in recent years, and researchers have given a lot of attention to data level techniques and algorithm level. However, existing methods often generate samples in specific regions without considering the complexity of imbalanced distributions. This can lead to learning models overemphasizing certain difficult factors in the minority data. In this paper, a Monte Carlo sampling algorithm based on Gaussian Mixture Model (MCS-GMM) is proposed. In MCS-GMM, we utilize the Gaussian mixed model to fit the distribution of the imbalanced data and apply the Monte Carlo algorithm to generate new data. Then, in order to reduce the impact of data overlap, the three sigma rule is used to divide data into four types, and the weight of each minority class instance based on its neighbor and probability density function. Based on experiments conducted on Knowledge Extraction based on Evolutionary Learning datasets, our method has been proven to be effective and outperforms existing approaches such as Synthetic Minority Over-sampling TEchnique.
引用
收藏
页码:17866 / 17885
页数:20
相关论文
共 50 条
  • [21] EEG Data Augmentation Method Based on the Gaussian Mixture Model
    Liao, Chuncheng
    Zhao, Shiyu
    Wang, Xiangcun
    Zhang, Jiacai
    Liao, Yongzhong
    Wu, Xia
    MATHEMATICS, 2025, 13 (05)
  • [22] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Sun, Bo
    Chen, Haiyan
    Wang, Jiandong
    Xie, Hua
    FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (02) : 331 - 350
  • [23] Over-sampling algorithm for imbalanced data classification
    Xu Xiaolong
    Chen Wen
    Sun Yanfei
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2019, 30 (06) : 1182 - 1191
  • [24] An Approach to Imbalanced Data Classification Based on Instance Selection and Over-Sampling
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT I, 2019, 11683 : 601 - 610
  • [25] Imbalanced Data Classification Method Based on LSSASMOTE
    Wang, Zhi
    Liu, Qicheng
    IEEE ACCESS, 2023, 11 : 32252 - 32260
  • [26] Imbalanced Data Over-Sampling Method Based on ISODATA Clustering
    Lv, Zhenzhe
    Liu, Qicheng
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (09) : 1528 - 1536
  • [27] A Gaussian mixture model based discretization algorithm for associative classification of medical data
    Khanmohammadi, Sina
    Chou, Chun-An
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 58 : 119 - 129
  • [28] A Sampling Method of Imbalanced Data Based on Sample Space
    Zhang Y.-Q.
    Lu R.-Z.
    Qiao S.-J.
    Han N.
    Gutierrez L.A.
    Zhou J.-L.
    Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (10): : 2549 - 2563
  • [29] An Effective Over-sampling Method for Imbalanced Data Sets Classification
    Zhai Yun
    Ma Nan
    Ruan Da
    An Bing
    CHINESE JOURNAL OF ELECTRONICS, 2011, 20 (03): : 489 - 494
  • [30] ISODF-ENN:Imbalanced data mixed sampling method based on improved diffusion model and ENN
    Lv, Zhenzhe
    Liu, Qicheng
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (01) : 221 - 235