A new boundary-degree-based oversampling method for imbalanced data

被引:5
|
作者
Chen, Yueqi [1 ,2 ]
Pedrycz, Witold [3 ,4 ,5 ]
Yang, Jie [1 ,2 ]
机构
[1] Dalian Univ Technol, Sch Math Sci, Dalian 116024, Liaoning, Peoples R China
[2] Key Lab Computat Math & Data Intelligence Liaonin, Dalian 116024, Liaoning, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2R3, Canada
[4] Polish Acad Sci, Syst Res Inst, PL-00901 Warsaw, Mazowieckie, Poland
[5] Istinye Univ, Fac Engn & Nat Sci, Dept Comp Engn, Istanbul 34460, Turkiye
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Imbalanced learning; Information entropy; Gradient; Gaussian probability distribution function; Oversampling; SMOTE; CLASSIFICATION; ALGORITHM; FRAMEWORK;
D O I
10.1007/s10489-023-04846-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data constitute a significant challenge in practical applications, as standard classifiers are usually designed to work on data with balanced class label distributions. One of effective methods to solve the imbalanced problem is boundary oversampling method, which only focuses on the classification of boundary samples. However, most boundary oversampling methods roughly select boundary samples for oversampling without considering the potentially useful boundary characteristics inherent in majority (negative) class. To overcome this limitation, we propose a novel boundary-degree-based oversampling method (BDO) in this paper. The originality of BDO stemps from quantifying the degree to which each negative sample can be regarded as a boundary sample in terms of probability using information entropy. Applying the sigma rule on the quantified boundary degree, negative boundary samples are determined to indirectly select minority (positive) boundary samples for oversampling. In this way, a substantial amount of information hidden in the negative class can be mined. To further transfer the mined information to help oversample, BDO iteratively synthesizes aided boundary points along a fraudulent gradient. Oversampling finally is performed on both positive boundary samples and the aided boundary points. Experimental results completed on 15 benchmark imbalanced datasets, two multi-label datasets and one large-scale dataset in terms of G-mean, F-measure, AUC, accuracy, TPR and TNR show that BDO exhibits better performance, which is competitive with some commonly considered methods.
引用
收藏
页码:26518 / 26541
页数:24
相关论文
共 50 条
  • [1] A new boundary-degree-based oversampling method for imbalanced data
    Yueqi Chen
    Witold Pedrycz
    Jie Yang
    Applied Intelligence, 2023, 53 : 26518 - 26541
  • [2] A new imbalanced data oversampling method based on Bootstrap method and Wasserstein Generative Adversarial Network
    Hou, Binjie
    Chen, Gang
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2024, 21 (03) : 4309 - 4327
  • [3] Imbalanced Learning with Oversampling based on Classification Contribution Degree
    Jiang, Zhenhao
    Yang, Jie
    Liu, Yan
    ADVANCED THEORY AND SIMULATIONS, 2021, 4 (05)
  • [4] A New Oversampling Method Based on the Classification Contribution Degree
    Jiang, Zhenhao
    Pan, Tingting
    Zhang, Chao
    Yang, Jie
    SYMMETRY-BASEL, 2021, 13 (02): : 1 - 13
  • [5] New Oversampling Approaches Based on Polynomial Fitting for Imbalanced Data Sets
    Gazzah, Sami
    Ben Amara, Najoua Essoukri
    PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 2008, : 677 - 684
  • [6] A NEW RESAMPLING METHOD OF IMBALANCED LARGE DATA BASED ON CLASS BOUNDARY
    Xing Sheng
    Zhai Junhai
    Wang Xiaolan
    Yuan Ming
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOL. 2, 2015, : 826 - 831
  • [7] Oversampling Method for Imbalanced Data Using Credible Counterfactual
    Gao, Feng
    Song, Mei
    Zhu, Yi
    Computer Engineering and Applications, 2024, 60 (05) : 165 - 171
  • [8] An oversampling method for multi-class imbalanced data based on composite weights
    Deng, Mingyang
    Guo, Yingshi
    Wang, Chang
    Wu, Fuwei
    PLOS ONE, 2021, 16 (11):
  • [9] A quantum-based oversampling method for classification of highly imbalanced and overlapped data
    Yang, Bei
    Tian, Guilan
    Luttrell, Joseph
    Gong, Ping
    Zhang, Chaoyang
    EXPERIMENTAL BIOLOGY AND MEDICINE, 2023, 248 (24) : 2500 - 2513
  • [10] Gaussian Distribution Based Oversampling for Imbalanced Data Classification
    Xie, Yuxi
    Qiu, Min
    Zhang, Haibo
    Peng, Lizhi
    Chen, Zhenxiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 667 - 679