A new Monte Carlo sampling method based on Gaussian Mixture Model for imbalanced data classification

被引：0

作者：

Chen, Gang ^{[1
]}

Hou, Binjie ^{[1
]}

Lei, Tiangang ^{[1
]}

机构：

[1] Dalian Maritime Univ, Dept Math, Dalian 116026, Peoples R China

来源：

MATHEMATICAL BIOSCIENCES AND ENGINEERING | 2023年 / 20卷 / 10期

关键词：

imbalanced data; Monte Carlo sampling; probability density function; oversampling technique; Gaussian Mixture Model; SMOTE; SUPPORT;

D O I：

10.3934/mbe.2023794

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Imbalanced data classification has been a major topic in the machine learning community. Different approaches can be taken to solve the issue in recent years, and researchers have given a lot of attention to data level techniques and algorithm level. However, existing methods often generate samples in specific regions without considering the complexity of imbalanced distributions. This can lead to learning models overemphasizing certain difficult factors in the minority data. In this paper, a Monte Carlo sampling algorithm based on Gaussian Mixture Model (MCS-GMM) is proposed. In MCS-GMM, we utilize the Gaussian mixed model to fit the distribution of the imbalanced data and apply the Monte Carlo algorithm to generate new data. Then, in order to reduce the impact of data overlap, the three sigma rule is used to divide data into four types, and the weight of each minority class instance based on its neighbor and probability density function. Based on experiments conducted on Knowledge Extraction based on Evolutionary Learning datasets, our method has been proven to be effective and outperforms existing approaches such as Synthetic Minority Over-sampling TEchnique.

引用

页码：17866 / 17885

页数：20

共 50 条

[31] A design of information granule-based under-sampling method in imbalanced data classification
Tianyu Liu
Xiubin Zhu
Witold Pedrycz
Zhiwu Li
Soft Computing, 2020, 24 : 17333 - 17347
[32] A histogram SMOTE-based sampling algorithm with incremental learning for imbalanced data classification
Liaw, Lawrence Chuin Ming
Tan, Shing Chiang
Goh, Pey Yun
Lim, Chee Peng
INFORMATION SCIENCES, 2025, 686
[33] Exploratory parallel hybrid sampling framework for imbalanced data classification
Zheng, Ming
Zhao, Zhuo
Wang, Fei
Hu, Xiaowen
Xu, Sheng
Li, Wanggen
Li, Tong
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
[34] Imbalanced Chinese Text Classification Based on Weighted Sampling
Li, Hu
Zou, Peng
Han, WeiHong
Xia, Rongze
TRUSTWORTHY COMPUTING AND SERVICES, 2014, 426 : 38 - 45
[35] Imbalanced Data Classification Based on Clustering
Li, Hu
Zou, Peng
Han, Weihong
Xia, Rongze
COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 741 - 745
[36] An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
Arafat, Md. Yasir
Hoque, Sabera
Xu, Shuxiang
Farid, Dewan Md.
2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
[37] A new sampling method for classifying imbalanced data based on support vector machine ensemble
Jian, Chuanxia
Gao, Jian
Ao, Yinhui
NEUROCOMPUTING, 2016, 193 : 115 - 122
[38] A cluster-based hybrid sampling approach for imbalanced data classification
Feng, Shou
Zhao, Chunhui
Fu, Ping
REVIEW OF SCIENTIFIC INSTRUMENTS, 2020, 91 (05)
[39] Denoise-Based Over-Sampling for Imbalanced Data Classification
Dan, Wang
Yian, Liu
2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 275 - 278
[40] Improving Power System Risk Evaluation Method Using Monte Carlo Simulation and Gaussian Mixture Method
Mousavi, Omid A.
Farashbashi-Astaneh, Mostafa S.
Gharehpetian, Gevork B.
ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2009, 9 (02) : 38 - 44

← 1 2 3 4 5 →