A new Monte Carlo sampling method based on Gaussian Mixture Model for imbalanced data classification

被引:0
|
作者
Chen, Gang [1 ]
Hou, Binjie [1 ]
Lei, Tiangang [1 ]
机构
[1] Dalian Maritime Univ, Dept Math, Dalian 116026, Peoples R China
关键词
imbalanced data; Monte Carlo sampling; probability density function; oversampling technique; Gaussian Mixture Model; SMOTE; SUPPORT;
D O I
10.3934/mbe.2023794
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Imbalanced data classification has been a major topic in the machine learning community. Different approaches can be taken to solve the issue in recent years, and researchers have given a lot of attention to data level techniques and algorithm level. However, existing methods often generate samples in specific regions without considering the complexity of imbalanced distributions. This can lead to learning models overemphasizing certain difficult factors in the minority data. In this paper, a Monte Carlo sampling algorithm based on Gaussian Mixture Model (MCS-GMM) is proposed. In MCS-GMM, we utilize the Gaussian mixed model to fit the distribution of the imbalanced data and apply the Monte Carlo algorithm to generate new data. Then, in order to reduce the impact of data overlap, the three sigma rule is used to divide data into four types, and the weight of each minority class instance based on its neighbor and probability density function. Based on experiments conducted on Knowledge Extraction based on Evolutionary Learning datasets, our method has been proven to be effective and outperforms existing approaches such as Synthetic Minority Over-sampling TEchnique.
引用
收藏
页码:17866 / 17885
页数:20
相关论文
共 50 条
  • [31] A design of information granule-based under-sampling method in imbalanced data classification
    Tianyu Liu
    Xiubin Zhu
    Witold Pedrycz
    Zhiwu Li
    Soft Computing, 2020, 24 : 17333 - 17347
  • [32] A histogram SMOTE-based sampling algorithm with incremental learning for imbalanced data classification
    Liaw, Lawrence Chuin Ming
    Tan, Shing Chiang
    Goh, Pey Yun
    Lim, Chee Peng
    INFORMATION SCIENCES, 2025, 686
  • [33] Exploratory parallel hybrid sampling framework for imbalanced data classification
    Zheng, Ming
    Zhao, Zhuo
    Wang, Fei
    Hu, Xiaowen
    Xu, Sheng
    Li, Wanggen
    Li, Tong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
  • [34] Imbalanced Chinese Text Classification Based on Weighted Sampling
    Li, Hu
    Zou, Peng
    Han, WeiHong
    Xia, Rongze
    TRUSTWORTHY COMPUTING AND SERVICES, 2014, 426 : 38 - 45
  • [35] Imbalanced Data Classification Based on Clustering
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 741 - 745
  • [36] An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md.
    2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [37] A new sampling method for classifying imbalanced data based on support vector machine ensemble
    Jian, Chuanxia
    Gao, Jian
    Ao, Yinhui
    NEUROCOMPUTING, 2016, 193 : 115 - 122
  • [38] A cluster-based hybrid sampling approach for imbalanced data classification
    Feng, Shou
    Zhao, Chunhui
    Fu, Ping
    REVIEW OF SCIENTIFIC INSTRUMENTS, 2020, 91 (05)
  • [39] Denoise-Based Over-Sampling for Imbalanced Data Classification
    Dan, Wang
    Yian, Liu
    2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 275 - 278
  • [40] Improving Power System Risk Evaluation Method Using Monte Carlo Simulation and Gaussian Mixture Method
    Mousavi, Omid A.
    Farashbashi-Astaneh, Mostafa S.
    Gharehpetian, Gevork B.
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2009, 9 (02) : 38 - 44