A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

被引:11
作者
Liu, Ruijuan [1 ]
机构
[1] Chongqing Jianzhu Coll, Dept Publ Course, Chongqing 400072, Peoples R China
基金
英国科研创新办公室;
关键词
Class-imbalance learning; Class-imbalance classification; Oversampling; K nearest neighbors; Relative density; BORDERLINE-SMOTE; SAMPLING METHOD; ALGORITHM;
D O I
10.1007/s10489-022-03512-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning a classifier from class-imbalance data is an important challenge. Among the existing solutions, SMOTE has received great praise and features an extensive range of practical applications. However, SMOTE and its extensions usually degrade due to noise generation and within-class imbalances. Although multiple variations of SMOTE are developed, few of them can solve the above problems at the same time. Besides, many improvements of SMOTE are based on advanced models with introducing external parameters. To solve imbalances between and within classes while overcoming noise generation, a novel synthetic minority oversampling technique based on relative and absolute densities is proposed. First, a novel noise filter based on relative density is proposed to remove noise and smooth class boundary. Second, sparsity and boundary weights are proposed and calculated by relative and absolute densities, respectively. Third, normalized weights based on absolute and sparse weights are proposed to generate more synthetic minority class samples in the class boundary and sparse regions. The main advantages of the proposed algorithm are that: (a) It can effectively avoid noise generation while removing noise and smoothing class the boundary in original data. (b) It generates more synthetic samples in class boundaries and sparse regions; (c) No additional parameters are introduced. Intensive experiments prove that SMOTE-RD outperforms 7 popular oversampling methods in average AUC, average F-measure and average G-mean on real data sets with the acceptable time cost.
引用
收藏
页码:786 / 803
页数:18
相关论文
共 43 条
  • [11] Fan W, 1999, MACHINE LEARNING, PROCEEDINGS, P97
  • [12] Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning
    Han, H
    Wang, WY
    Mao, BH
    [J]. ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 878 - 887
  • [13] ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning
    He, Haibo
    Bai, Yang
    Garcia, Edwardo A.
    Li, Shutao
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1322 - 1328
  • [14] S-SulfPred: A sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique
    Jia, Cangzhi
    Zuo, Yun
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2017, 422 : 84 - 89
  • [15] Jiahua Zhang, 2017, Security, Privacy and Anonymity in Computation, Communication and Storage, SpaCCS 2017: International Workshops. Proceedings: LNCS 10658, P45, DOI 10.1007/978-3-319-72395-2_5
  • [16] Kamarulzalis AH, 2018, IISA 2018 ADV INTELL
  • [17] Improving software quality prediction by noise filtering techniques
    Khoshgoftaar, Taghi M.
    Rebours, Pierre
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2007, 22 (03) : 387 - 396
  • [18] A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors
    Li, Junnan
    Zhu, Qingsheng
    Wu, Quanwang
    Fan, Zhu
    [J]. INFORMATION SCIENCES, 2021, 565 : 438 - 455
  • [19] SMOTE-NaN-DE: Addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution
    Li, Junnan
    Zhu, Qingsheng
    Wu, Quanwang
    Zhang, Zhiyong
    Gong, Yanlu
    He, Ziqing
    Zhu, Fan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 223
  • [20] A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor
    Li, Junnan
    Zhu, Qingsheng
    Wu, Quanwang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 184