Over-sampling algorithm for imbalanced data classification

被引:0
|
作者
XU Xiaolong [1 ]
CHEN Wen [2 ]
SUN Yanfei [3 ]
机构
[1] Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications
[2] Institute of Big Data Research at Yancheng, Nanjing University of Posts and Telecommunications
[3] Office of Scientific R&D, Nanjing University of Posts and Telecommunications
关键词
imbalanced data; density-based spatial clustering of applications with noise(DBSCAN); synthetic minority oversampling technique(SMOTE); over-sampling;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
引用
收藏
页码:1182 / 1191
页数:10
相关论文
共 50 条
  • [11] Abstention-SMOTE: An over-sampling approach for imbalanced data classification
    Zhang, Cheng
    Chen, Yufei
    Liu, Xianhui
    Zhao, Xiaodong
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2017), 2017, : 17 - 21
  • [12] Learning from Imbalanced Data Using Over-Sampling and the Firefly Algorithm
    Czarnowski, Ireneusz
    COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 12876 : 373 - 386
  • [13] RWO-Sampling: A random walk over-sampling approach to imbalanced data classification
    Zhang, Huaxiang
    Li, Mingfang
    INFORMATION FUSION, 2014, 20 : 99 - 116
  • [14] An overlapping minimization-based over-sampling algorithm for binary imbalanced classification
    Lu, Xuan
    Ye, Xuan
    Cheng, Yingchao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [15] A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification
    Zhang, Huaxiang
    Wang, Zhichao
    ADVANCED DATA MINING AND APPLICATIONS, PT I, 2011, 7120 : 83 - 96
  • [16] Imbalanced data classification using improved synthetic minority over-sampling technique
    Anusha, Yamijala
    Visalakshi, R.
    Srinivas, Konda
    MULTIAGENT AND GRID SYSTEMS, 2023, 19 (02) : 117 - 131
  • [17] Deep Over-sampling Framework for Classifying Imbalanced Data
    Ando, Shin
    Huang, Chun Yuan
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 770 - 785
  • [18] Classification of imbalanced PubChem BioAssay data using an efficient algorithm coupled with synthetic minority over-sampling technique
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 247
  • [19] Over-sampling methods for mixed data in imbalanced problems
    Alonso, Hugo
    da Costa, Joaquim Fernando Pinto
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [20] A priori synthetic over-sampling methods for increasing classification sensitivity in imbalanced data sets
    Rivera, William A.
    Xanthopoulos, Petros
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 66 : 124 - 135