Over-sampling algorithm for imbalanced data classification

被引:0
作者
XU Xiaolong [1 ]
CHEN Wen [2 ]
SUN Yanfei [3 ]
机构
[1] Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications
[2] Institute of Big Data Research at Yancheng, Nanjing University of Posts and Telecommunications
[3] Office of Scientific R&D, Nanjing University of Posts and Telecommunications
关键词
imbalanced data; density-based spatial clustering of applications with noise(DBSCAN); synthetic minority oversampling technique(SMOTE); over-sampling;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
引用
收藏
页码:1182 / 1191
页数:10
相关论文
共 50 条
  • [21] An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    [J]. ANALYTICA CHIMICA ACTA, 2014, 806 : 117 - 127
  • [22] Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification
    Alex, Suja A.
    Nayahi, J. Jesu Vedha
    Kaddoura, Sanaa
    [J]. APPLIED SOFT COMPUTING, 2024, 156
  • [23] Affine combination-based over-sampling for imbalanced regression
    Li, Zhen-Zhen
    Huang, Niu
    Yi, Lun-Zhao
    Fu, Guang-Hui
    [J]. JOURNAL OF CHEMOMETRICS, 2024, 38 (03)
  • [24] KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
    Hao Ding
    Bin Wei
    Zhaorui Gu
    Zhibin Yu
    Haiyong Zheng
    Bing Zheng
    Juan Li
    [J]. Multimedia Tools and Applications, 2020, 79 : 14871 - 14888
  • [25] KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
    Ding, Hao
    Wei, Bin
    Gu, Zhaorui
    Yu, Zhibin
    Zheng, Haiyong
    Zheng, Bing
    Li, Juan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (21-22) : 14871 - 14888
  • [26] Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI): An approach for learning from imbalanced data streams
    Czarnowski, Ireneusz
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 61
  • [27] A Novel Borderline Over-Sampling Method Based on KNN and Deep Gaussian Mixture Model for Imbalanced Data
    Zhang H.
    Xiao H.
    Yi C.
    Yuan R.
    [J]. Data Analysis and Knowledge Discovery, 2023, 7 (05) : 116 - 122
  • [28] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
    El-Sayed, Asmaa Ahmed
    Meguid, Nagwa Abdel
    Mahmood, Mahmood Abdel Manem
    Hefny, Hesham Ahmed
    [J]. PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [29] An adaptive over-sampling method for imbalanced data based on simultaneous clustering and filtering noisy
    Chen, Wei
    Guo, Wenjie
    Mao, Weijie
    [J]. APPLIED INTELLIGENCE, 2024, 54 (22) : 11430 - 11449
  • [30] AN IMBALANCED SIGNAL MODULATION CLASSIFICATION AND EVALUATION METHOD BASED ON SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE
    Liu, Xuebo
    Wang, Yiran
    Bai, Jing
    Li, Haoran
    Wang, Xu
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6224 - 6227