A new boundary-degree-based oversampling method for imbalanced data

被引:5
|
作者
Chen, Yueqi [1 ,2 ]
Pedrycz, Witold [3 ,4 ,5 ]
Yang, Jie [1 ,2 ]
机构
[1] Dalian Univ Technol, Sch Math Sci, Dalian 116024, Liaoning, Peoples R China
[2] Key Lab Computat Math & Data Intelligence Liaonin, Dalian 116024, Liaoning, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2R3, Canada
[4] Polish Acad Sci, Syst Res Inst, PL-00901 Warsaw, Mazowieckie, Poland
[5] Istinye Univ, Fac Engn & Nat Sci, Dept Comp Engn, Istanbul 34460, Turkiye
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Imbalanced learning; Information entropy; Gradient; Gaussian probability distribution function; Oversampling; SMOTE; CLASSIFICATION; ALGORITHM; FRAMEWORK;
D O I
10.1007/s10489-023-04846-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data constitute a significant challenge in practical applications, as standard classifiers are usually designed to work on data with balanced class label distributions. One of effective methods to solve the imbalanced problem is boundary oversampling method, which only focuses on the classification of boundary samples. However, most boundary oversampling methods roughly select boundary samples for oversampling without considering the potentially useful boundary characteristics inherent in majority (negative) class. To overcome this limitation, we propose a novel boundary-degree-based oversampling method (BDO) in this paper. The originality of BDO stemps from quantifying the degree to which each negative sample can be regarded as a boundary sample in terms of probability using information entropy. Applying the sigma rule on the quantified boundary degree, negative boundary samples are determined to indirectly select minority (positive) boundary samples for oversampling. In this way, a substantial amount of information hidden in the negative class can be mined. To further transfer the mined information to help oversample, BDO iteratively synthesizes aided boundary points along a fraudulent gradient. Oversampling finally is performed on both positive boundary samples and the aided boundary points. Experimental results completed on 15 benchmark imbalanced datasets, two multi-label datasets and one large-scale dataset in terms of G-mean, F-measure, AUC, accuracy, TPR and TNR show that BDO exhibits better performance, which is competitive with some commonly considered methods.
引用
收藏
页码:26518 / 26541
页数:24
相关论文
共 50 条
  • [31] Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification
    Tao, Xinmin
    Guo, Xinyue
    Zheng, Yujia
    Zhang, Xiaohan
    Chen, Zhiyu
    KNOWLEDGE-BASED SYSTEMS, 2023, 277
  • [32] An Improved SMOTE Imbalanced Data Classification Method Based on Support Degree
    Li, Kewen
    Zhang, Wenrong
    Lu, Qinghua
    Fang, Xianghua
    2014 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI 2014), 2014, : 34 - 38
  • [33] Generative Oversampling Method (GenOMe) for Imbalanced Data on Apnea Detection using ECG Data
    Sanabila, H. R.
    Kusuma, Ilham
    Jatmiko, Wisnu
    2016 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2016, : 572 - 577
  • [34] A new instance density-based synthetic minority oversampling method for imbalanced classification problems
    Ma, Chung-Kang
    Park, You-Jin
    ENGINEERING OPTIMIZATION, 2022, 54 (10) : 1743 - 1757
  • [35] Overlap to equilibrium: Oversampling imbalanced datasets using overlapping degree
    Jubair, Sidra
    Yang, Jie
    Ali, Bilal
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (02)
  • [36] Selective oversampling approach for strongly imbalanced data
    Gnip P.
    Vokorokos L.
    Drotár P.
    PeerJ Computer Science, 2021, 7 : 1 - 22
  • [37] Selective oversampling approach for strongly imbalanced data
    Gnip, Peter
    Vokorokos, Liberios
    Drotar, Peter
    PEERJ COMPUTER SCIENCE, 2021,
  • [38] A NOVEL RULE-BASED OVERSAMPLING APPROACH FOR IMBALANCED DATA CLASSIFICATION
    Zhang, Xiao
    Paz, Ivan
    Nebot, Angela
    37TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE 2023, ESM 2023, 2023, : 208 - 212
  • [39] Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data
    Ren, Ruonan
    Yang, Youlong
    Sun, Liqin
    APPLIED INTELLIGENCE, 2020, 50 (08) : 2465 - 2487
  • [40] A No Parameter Synthetic Minority Oversampling Technique Based on Finch for Imbalanced Data
    Xu, Shoukun
    Li, Zhibang
    Yuan, Baohua
    Yang, Gaochao
    Wang, Xueyuan
    Li, Ning
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 367 - 378