Over-sampling algorithm for imbalanced data classification

被引:0
作者
XU Xiaolong [1 ]
CHEN Wen [2 ]
SUN Yanfei [3 ]
机构
[1] Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications
[2] Institute of Big Data Research at Yancheng, Nanjing University of Posts and Telecommunications
[3] Office of Scientific R&D, Nanjing University of Posts and Telecommunications
关键词
imbalanced data; density-based spatial clustering of applications with noise(DBSCAN); synthetic minority oversampling technique(SMOTE); over-sampling;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
引用
收藏
页码:1182 / 1191
页数:10
相关论文
共 50 条
  • [31] Classifier Learning from Imbalanced Corpus by Autoencoded Over-Sampling
    Park, Eunkyung
    Wong, Raymond K.
    Chu, Victor W.
    [J]. PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2019, 11670 : 16 - 29
  • [32] Cluster-Based Minority Over-Sampling for Imbalanced Datasets
    Puntumapon, Kamthorn
    Rakthamamon, Thanawin
    Waiyamai, Kitsana
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3101 - 3109
  • [33] DOSS: Dual Over Sampling Strategy for Imbalanced Data Classification
    Wang, Qiushi
    Lee, Kee Jin
    Hong, Jihoon
    [J]. IECON 2018 - 44TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2018, : 5389 - 5394
  • [34] Feature selection and its combination with data over-sampling for multi-class imbalanced datasets
    Tsai, Chih-Fong
    Chen, Kuan-Chen
    Lin, Wei -Chao
    [J]. APPLIED SOFT COMPUTING, 2024, 153
  • [35] A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems
    Yang, Yuxuan
    Khorshidi, Hadi Akbarzadeh
    Aickelin, Uwe
    [J]. FRONTIERS IN DIGITAL HEALTH, 2024, 6
  • [36] The study of under- and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis
    Bach, M.
    Werner, A.
    Zywiec, J.
    Pluskiewicz, W.
    [J]. INFORMATION SCIENCES, 2017, 384 : 174 - 190
  • [37] AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems
    Yang, Xuebing
    Kuang, Qiuming
    Zhang, Wensheng
    Zhang, Guoping
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (09) : 1672 - 1685
  • [38] A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets
    Piri, Saeed
    Delen, Dursun
    Liu, Tieming
    [J]. DECISION SUPPORT SYSTEMS, 2018, 106 : 15 - 29
  • [39] A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data
    Ren, Weishuo
    Zheng, Yifeng
    Zhang, Wenjie
    Qing, Depeng
    Zeng, Xianlong
    Li, Guohe
    [J]. NEUROCOMPUTING, 2025, 612
  • [40] MNDO: Multivariate Normal Distribution Based Over-Sampling for Binary Classification
    Ambai, Kotaro
    Fujita, Hamido
    [J]. NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES (SOMET_18), 2018, 303 : 425 - 438