A new Monte Carlo sampling method based on Gaussian Mixture Model for imbalanced data classification

被引:0
|
作者
Chen, Gang [1 ]
Hou, Binjie [1 ]
Lei, Tiangang [1 ]
机构
[1] Dalian Maritime Univ, Dept Math, Dalian 116026, Peoples R China
关键词
imbalanced data; Monte Carlo sampling; probability density function; oversampling technique; Gaussian Mixture Model; SMOTE; SUPPORT;
D O I
10.3934/mbe.2023794
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Imbalanced data classification has been a major topic in the machine learning community. Different approaches can be taken to solve the issue in recent years, and researchers have given a lot of attention to data level techniques and algorithm level. However, existing methods often generate samples in specific regions without considering the complexity of imbalanced distributions. This can lead to learning models overemphasizing certain difficult factors in the minority data. In this paper, a Monte Carlo sampling algorithm based on Gaussian Mixture Model (MCS-GMM) is proposed. In MCS-GMM, we utilize the Gaussian mixed model to fit the distribution of the imbalanced data and apply the Monte Carlo algorithm to generate new data. Then, in order to reduce the impact of data overlap, the three sigma rule is used to divide data into four types, and the weight of each minority class instance based on its neighbor and probability density function. Based on experiments conducted on Knowledge Extraction based on Evolutionary Learning datasets, our method has been proven to be effective and outperforms existing approaches such as Synthetic Minority Over-sampling TEchnique.
引用
收藏
页码:17866 / 17885
页数:20
相关论文
共 50 条
  • [41] Imbalanced Data Classification Method Based on Ensemble Learning
    Xiang, Yu
    Xie, Yongping
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 18 - 24
  • [42] Adaptive Fusion Based Method for Imbalanced Data Classification
    Liang, Zefeng
    Wang, Huan
    Yang, Kaixiang
    Shi, Yifan
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [43] A Classification Method Based on Feature Selection for Imbalanced Data
    Liu, Yi
    Wang, Yanzhen
    Ren, Xiaoguang
    Zhou, Hao
    Diao, Xingchun
    IEEE ACCESS, 2019, 7 : 81794 - 81807
  • [44] A Parsimonious Mixture of Gaussian Trees Model for Oversampling in Imbalanced and Multimodal Time-Series Classification
    Cao, Hong
    Tan, Vincent Y. F.
    Pang, John Z. F.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (12) : 2226 - 2239
  • [45] MOGT: OVERSAMPLING WITH A PARSIMONIOUS MIXTURE OF GAUSSIAN TREES MODEL FOR IMBALANCED TIME-SERIES CLASSIFICATION
    Pang, John Z. F.
    Cao, Hong
    Tan, Vincent Y. F.
    2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2013,
  • [46] A new approach for imbalanced data classification based on data gravitation
    Peng, Lizhi
    Zhang, Hongli
    Yang, Bo
    Chen, Yuehui
    INFORMATION SCIENCES, 2014, 288 : 347 - 373
  • [47] A GAN-based hybrid sampling method for imbalanced customer classification
    Zhu, Bing
    Pan, Xin
    vanden Broucke, Seppe
    Xiao, Jin
    INFORMATION SCIENCES, 2022, 609 : 1397 - 1411
  • [48] Safe sample screening based sampling method for imbalanced data
    Shi H.
    Liu Y.
    Ji S.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (06): : 545 - 556
  • [49] A hybrid imbalanced classification model based on data density
    Shi, Shengnan
    Li, Jie
    Zhu, Dan
    Yang, Fang
    Xu, Yong
    INFORMATION SCIENCES, 2023, 624 : 50 - 67
  • [50] CVAE-Based Hybrid Sampling Data Augmentation Method and Interpretation for Imbalanced Classification of Gout Disease
    Si, Xiaonan
    Fu, Yifan
    Liu, Xinran
    Wang, Rulin
    Xu, Wenchang
    Wang, Lei
    ADVANCED INTELLIGENT COMPUTING IN BIOINFORMATICS, PT I, ICIC 2024, 2024, 14881 : 49 - 60