A new Monte Carlo sampling method based on Gaussian Mixture Model for imbalanced data classification

被引:0
|
作者
Chen, Gang [1 ]
Hou, Binjie [1 ]
Lei, Tiangang [1 ]
机构
[1] Dalian Maritime Univ, Dept Math, Dalian 116026, Peoples R China
关键词
imbalanced data; Monte Carlo sampling; probability density function; oversampling technique; Gaussian Mixture Model; SMOTE; SUPPORT;
D O I
10.3934/mbe.2023794
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Imbalanced data classification has been a major topic in the machine learning community. Different approaches can be taken to solve the issue in recent years, and researchers have given a lot of attention to data level techniques and algorithm level. However, existing methods often generate samples in specific regions without considering the complexity of imbalanced distributions. This can lead to learning models overemphasizing certain difficult factors in the minority data. In this paper, a Monte Carlo sampling algorithm based on Gaussian Mixture Model (MCS-GMM) is proposed. In MCS-GMM, we utilize the Gaussian mixed model to fit the distribution of the imbalanced data and apply the Monte Carlo algorithm to generate new data. Then, in order to reduce the impact of data overlap, the three sigma rule is used to divide data into four types, and the weight of each minority class instance based on its neighbor and probability density function. Based on experiments conducted on Knowledge Extraction based on Evolutionary Learning datasets, our method has been proven to be effective and outperforms existing approaches such as Synthetic Minority Over-sampling TEchnique.
引用
收藏
页码:17866 / 17885
页数:20
相关论文
共 50 条
  • [1] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
    Xu, Zhaozhao
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753
  • [2] A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets
    Han, Xu
    Cui, Runbang
    Lan, Yanfei
    Kang, Yanzhe
    Deng, Jiang
    Jia, Ning
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (12) : 3687 - 3699
  • [3] A Gaussian Mixture Based Boosted Classification Scheme For Imbalanced And Oversampled Data
    Pal, Biprodip
    Paul, Mahit Kumar
    2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION ENGINEERING (ECCE), 2017, : 401 - 405
  • [4] A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets
    Xu Han
    Runbang Cui
    Yanfei Lan
    Yanzhe Kang
    Jiang Deng
    Ning Jia
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 3687 - 3699
  • [5] A Novel Borderline Over-Sampling Method Based on KNN and Deep Gaussian Mixture Model for Imbalanced Data
    Zhang H.
    Xiao H.
    Yi C.
    Yuan R.
    Data Analysis and Knowledge Discovery, 2023, 7 (05) : 116 - 122
  • [6] Adaptive synthetic sampling of imbalanced data based on variation Bayesian-optimized Gaussian mixture model
    Liu J.-P.
    Yang B.-F.
    Zhou J.-M.
    Xu P.-F.
    Kongzhi yu Juece/Control and Decision, 2023, 38 (06): : 1653 - 1660
  • [7] Gaussian Mixture Based Semi Supervised Boosting For Imbalanced Data Classification
    Paul, Mahit Kumar
    Pal, Biprodip
    2016 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER & TELECOMMUNICATION ENGINEERING (ICECTE), 2016,
  • [8] A New Combination Sampling Method for Imbalanced Data
    Li, Hu
    Zou, Peng
    Wang, Xiang
    Xia, Rongze
    PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 547 - 554
  • [9] Multimodal Biometric Score Fusion Using Gaussian Mixture Model and Monte Carlo Method
    R Raghavendra
    Rao Ashok
    G Hemantha Kumar
    JournalofComputerScience&Technology, 2010, 25 (04) : 771 - 782
  • [10] Multimodal Biometric Score Fusion Using Gaussian Mixture Model and Monte Carlo Method
    R Raghavendra
    Rao Ashok
    G Hemantha Kumar
    Journal of Computer Science and Technology, 2010, 25 : 771 - 782