A new boundary-degree-based oversampling method for imbalanced data

被引:5
|
作者
Chen, Yueqi [1 ,2 ]
Pedrycz, Witold [3 ,4 ,5 ]
Yang, Jie [1 ,2 ]
机构
[1] Dalian Univ Technol, Sch Math Sci, Dalian 116024, Liaoning, Peoples R China
[2] Key Lab Computat Math & Data Intelligence Liaonin, Dalian 116024, Liaoning, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2R3, Canada
[4] Polish Acad Sci, Syst Res Inst, PL-00901 Warsaw, Mazowieckie, Poland
[5] Istinye Univ, Fac Engn & Nat Sci, Dept Comp Engn, Istanbul 34460, Turkiye
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Imbalanced learning; Information entropy; Gradient; Gaussian probability distribution function; Oversampling; SMOTE; CLASSIFICATION; ALGORITHM; FRAMEWORK;
D O I
10.1007/s10489-023-04846-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data constitute a significant challenge in practical applications, as standard classifiers are usually designed to work on data with balanced class label distributions. One of effective methods to solve the imbalanced problem is boundary oversampling method, which only focuses on the classification of boundary samples. However, most boundary oversampling methods roughly select boundary samples for oversampling without considering the potentially useful boundary characteristics inherent in majority (negative) class. To overcome this limitation, we propose a novel boundary-degree-based oversampling method (BDO) in this paper. The originality of BDO stemps from quantifying the degree to which each negative sample can be regarded as a boundary sample in terms of probability using information entropy. Applying the sigma rule on the quantified boundary degree, negative boundary samples are determined to indirectly select minority (positive) boundary samples for oversampling. In this way, a substantial amount of information hidden in the negative class can be mined. To further transfer the mined information to help oversample, BDO iteratively synthesizes aided boundary points along a fraudulent gradient. Oversampling finally is performed on both positive boundary samples and the aided boundary points. Experimental results completed on 15 benchmark imbalanced datasets, two multi-label datasets and one large-scale dataset in terms of G-mean, F-measure, AUC, accuracy, TPR and TNR show that BDO exhibits better performance, which is competitive with some commonly considered methods.
引用
收藏
页码:26518 / 26541
页数:24
相关论文
共 50 条
  • [21] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
  • [22] Distance-based arranging oversampling technique for imbalanced data
    Qi Dai
    Jian-wei Liu
    Jia-Liang Zhao
    Neural Computing and Applications, 2023, 35 : 1323 - 1342
  • [23] Improving interpolation-based oversampling for imbalanced data learning
    Zhu, Tuanfei
    Lin, Yaping
    Liu, Yonghe
    KNOWLEDGE-BASED SYSTEMS, 2020, 187
  • [24] A three-way decision ensemble method for imbalanced data oversampling
    Yan, Yuan Ting
    Wu, Zeng Bao
    Du, Xiu Quan
    Chen, Jie
    Zhao, Shu
    Zhang, Yan Ping
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2019, 107 (1-16) : 1 - 16
  • [25] An Improved Oversampling Method for imbalanced Data-SMOTE Based on Canopy and K-means
    Guo, Chaoyou
    Ma, Yankun
    Xu, Zhe
    Cao, Mengmeng
    Yao, Qian
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 1467 - 1469
  • [26] An oversampling method for imbalanced data based on spatial distribution of minority samples SD-KMSMOTE
    Wensheng Yang
    Chengsheng Pan
    Yanyan Zhang
    Scientific Reports, 12
  • [27] A novel oversampling method based on Wasserstein CGAN for imbalanced classification
    Zhou, Hongfang
    Pan, Heng
    Zheng, Kangyun
    Wu, Zongling
    Xiang, Qingyu
    CYBERSECURITY, 2025, 8 (01):
  • [28] An oversampling method for imbalanced data based on spatial distribution of minority samples SD-KMSMOTE
    Yang, Wensheng
    Pan, Chengsheng
    Zhang, Yanyan
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [29] A novel oversampling method based on SeqGAN for imbalanced text classification
    Luo, Yin
    Weng, Xuanlong
    Zheng, Huang
    Feng, Haishan
    Luang, Ke
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2891 - 2894
  • [30] Generative Oversampling Method for Imbalanced Data on Bearing Fault Detection and Diagnosis
    Suh, Sungho
    Lee, Haebom
    Jo, Jun
    Lukowicz, Paul
    Lee, Yong Oh
    APPLIED SCIENCES-BASEL, 2019, 9 (04):