Over-sampling algorithm for imbalanced data classification

被引:0
作者
XU Xiaolong [1 ]
CHEN Wen [2 ]
SUN Yanfei [3 ]
机构
[1] Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications
[2] Institute of Big Data Research at Yancheng, Nanjing University of Posts and Telecommunications
[3] Office of Scientific R&D, Nanjing University of Posts and Telecommunications
关键词
imbalanced data; density-based spatial clustering of applications with noise(DBSCAN); synthetic minority oversampling technique(SMOTE); over-sampling;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
引用
收藏
页码:1182 / 1191
页数:10
相关论文
共 50 条
[41]   Margin-Based Over-Sampling Method for Learning from Imbalanced Datasets [J].
Fan, Xiannian ;
Tang, Ke ;
Weise, Thomas .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6635 :309-320
[42]   An Improved Over-sampling Algorithm based on iForest and SMOTE [J].
Zheng, Yifeng ;
Li, Guohe ;
Zhang, Teng .
2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND COMPUTER APPLICATIONS (ICSCA 2019), 2019, :75-80
[43]   Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique [J].
Eom, Gayeong ;
Byeon, Haewon .
MATHEMATICS, 2023, 11 (16)
[44]   Applying Adaptive Over-sampling Technique Based on Data Density and Cost-Sensitive SVM to Imbalanced Learning [J].
Wang, Senzhang ;
Li, Zhoujun ;
Chao, Wenhan ;
Cao, Qinghua .
2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
[45]   AFNFS: Adaptive fuzzy neighborhood-based feature selection with adaptive synthetic over-sampling for imbalanced data [J].
Sun, Lin ;
Li, Mengmeng ;
Ding, Weiping ;
Zhang, En ;
Mu, Xiaoxia ;
Xu, Jiucheng .
INFORMATION SCIENCES, 2022, 612 :724-744
[46]   Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling [J].
Vandewiele, Gilles ;
Dehaene, Isabelle ;
Kovacs, Gyorgy ;
Sterckx, Lucas ;
Janssens, Olivier ;
Ongenae, Femke ;
Backere, Femke De ;
Turck, Filip De ;
Roelens, Kristien ;
Decruyenaere, Johan ;
Van Hoecke, Sofie ;
Demeester, Thomas .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2021, 111
[47]   Chrysanthemum Abnormal Petal Type Classification using Random Forest and Over-sampling [J].
Yuan, Peisen ;
Ren, Shougang ;
Xu, Huanliang ;
Chen, Jin .
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, :275-278
[48]   Combine Sampling Support Vector Machine for Imbalanced Data Classification [J].
Sain, Hartayuni ;
Purnami, Santi Wulan .
THIRD INFORMATION SYSTEMS INTERNATIONAL CONFERENCE 2015, 2015, 72 :59-66
[49]   Exploratory parallel hybrid sampling framework for imbalanced data classification [J].
Zheng, Ming ;
Zhao, Zhuo ;
Wang, Fei ;
Hu, Xiaowen ;
Xu, Sheng ;
Li, Wanggen ;
Li, Tong .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
[50]   To combat multi-class imbalanced problems by means of over-sampling and boosting techniques [J].
Abdi, Lida ;
Hashemi, Sattar .
SOFT COMPUTING, 2015, 19 (12) :3369-3385