Over-sampling algorithm for imbalanced data classification

被引：0

作者：

XU Xiaolong ^{[1
]}

CHEN Wen ^{[2
]}

SUN Yanfei ^{[3
]}

机构：

[1] Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications

[2] Institute of Big Data Research at Yancheng, Nanjing University of Posts and Telecommunications

[3] Office of Scientific R&D, Nanjing University of Posts and Telecommunications

来源：

JournalofSystemsEngineeringandElectronics | 2019年 / 30卷 / 06期

关键词：

imbalanced data; density-based spatial clustering of applications with noise(DBSCAN); synthetic minority oversampling technique(SMOTE); over-sampling;

D O I：

暂无

中图分类号：

TP311.13 [];

学科分类号：

1201 ;

摘要：

For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.

引用

页码：1182 / 1191

页数：10

共 50 条

[31] Classifier Learning from Imbalanced Corpus by Autoencoded Over-Sampling
Park, Eunkyung
Wong, Raymond K.
Chu, Victor W.
[J]. PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2019, 11670 : 16 - 29
[32] Cluster-Based Minority Over-Sampling for Imbalanced Datasets
Puntumapon, Kamthorn
Rakthamamon, Thanawin
Waiyamai, Kitsana
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3101 - 3109
[33] DOSS: Dual Over Sampling Strategy for Imbalanced Data Classification
Wang, Qiushi
Lee, Kee Jin
Hong, Jihoon
[J]. IECON 2018 - 44TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2018, : 5389 - 5394
[34] Feature selection and its combination with data over-sampling for multi-class imbalanced datasets
Tsai, Chih-Fong
Chen, Kuan-Chen
Lin, Wei -Chao
[J]. APPLIED SOFT COMPUTING, 2024, 153
[35] A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems
Yang, Yuxuan
Khorshidi, Hadi Akbarzadeh
Aickelin, Uwe
[J]. FRONTIERS IN DIGITAL HEALTH, 2024, 6
[36] The study of under- and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis
Bach, M.
Werner, A.
Zywiec, J.
Pluskiewicz, W.
[J]. INFORMATION SCIENCES, 2017, 384 : 174 - 190
[37] AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems
Yang, Xuebing
Kuang, Qiuming
Zhang, Wensheng
Zhang, Guoping
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (09) : 1672 - 1685
[38] A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets
Piri, Saeed
Delen, Dursun
Liu, Tieming
[J]. DECISION SUPPORT SYSTEMS, 2018, 106 : 15 - 29
[39] A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data
Ren, Weishuo
Zheng, Yifeng
Zhang, Wenjie
Qing, Depeng
Zeng, Xianlong
Li, Guohe
[J]. NEUROCOMPUTING, 2025, 612
[40] MNDO: Multivariate Normal Distribution Based Over-Sampling for Binary Classification
Ambai, Kotaro
Fujita, Hamido
[J]. NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES (SOMET_18), 2018, 303 : 425 - 438

← 1 2 3 4 5 →