Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets With Class Overlapping

被引:17
|
作者
Liu, Changhui [1 ]
Jin, Sun [2 ]
Wang, Donghong [3 ]
Luo, Zichao [4 ]
Yu, Jianbo [1 ]
Zhou, Binghai [1 ]
Yang, Changlin [5 ]
机构
[1] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, Sch Mat Sci & Engn, Shanghai 200240, Peoples R China
[4] Tokyo Inst Technol, Yoshino & Yamamoto Lab, Tokyo 1528550, Japan
[5] Northwestern Polytech Univ, State Key Lab Solidificat Proc, Xian 710000, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Classification algorithms; Proposals; Sampling methods; Ant colony optimization; Data models; Prediction algorithms; Data structures; Constrained oversampling; oversampling; class overlapping; imbalanced dataset; ANT COLONY OPTIMIZATION; DATA-SETS; CLASSIFICATION; SMOTE; REGRESSION;
D O I
10.1109/ACCESS.2020.3018911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced datasets are pervasive in classification tasks and would cause degradation of the performance of classifiers in predicting minority samples. Oversampling is effective in resolving the class imbalance problem. However, existing oversampling methods generally introduce noise examples into original datasets, especially when the datasets contain class overlapping regions. In this study, a new oversampling method named Constrained Oversampling is proposed to reduce noise generation in oversampling. This algorithm first extracts overlapping regions in the dataset. Then Ant Colony Optimization is applied to define the boundaries of minority regions. Third, oversampling under constraints is employed to synthesize new samples to get a balanced dataset. Our proposal distinguishes itself from other techniques by incorporating constraints in the oversampling process to inhibit noise generation. Experiments show that it outperforms various benchmark oversampling approaches. The explanation for the effectiveness of our method is given by studying the impact of class overlapping on imbalanced learning.
引用
收藏
页码:91452 / 91465
页数:14
相关论文
共 50 条
  • [21] Newton cooling theorem-based local overlapping regions cleaning and oversampling techniques for imbalanced datasets
    Tao, Liangliang
    Wang, Qingya
    Yu, Fen
    Cao, Hui
    Liang, Yage
    Luo, Huixia
    Guo, Jinghui
    NEUROCOMPUTING, 2025, 616
  • [22] Research on Oversampling Algorithm for Imbalanced Datasets Based On ARIMA Model
    Chen, Gang
    Guo, Xiaomei
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 2384 - 2389
  • [23] Radial-Based Approach to Imbalanced Data Oversampling
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017, 2017, 10334 : 318 - 327
  • [24] Noise-Robust Gaussian Distribution Based Imbalanced Oversampling
    Shao, Xuetao
    Yan, Yuanting
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT II, 2024, 14488 : 221 - 234
  • [25] A NOVEL RULE-BASED OVERSAMPLING APPROACH FOR IMBALANCED DATA CLASSIFICATION
    Zhang, Xiao
    Paz, Ivan
    Nebot, Angela
    37TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE 2023, ESM 2023, 2023, : 208 - 212
  • [26] A Boundary-Information-Based Oversampling Approach to Improve Learning Performance for Imbalanced Datasets
    Li, Der-Chiang
    Shi, Qi-Shi
    Lin, Yao-San
    Lin, Liang-Sian
    ENTROPY, 2022, 24 (03)
  • [27] Selective oversampling approach for strongly imbalanced data
    Gnip P.
    Vokorokos L.
    Drotár P.
    PeerJ Computer Science, 2021, 7 : 1 - 22
  • [28] Selective oversampling approach for strongly imbalanced data
    Gnip, Peter
    Vokorokos, Liberios
    Drotar, Peter
    PEERJ COMPUTER SCIENCE, 2021,
  • [29] Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets
    Nekooeimehr, Iman
    Lai-Yuen, Susana K.
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 46 : 405 - 416
  • [30] A-RDBOTE: an improved oversampling technique for imbalanced credit-scoring datasets
    Lenka, Sudhansu R.
    Bisoy, Sukant Kishoro
    Priyadarshini, Rojalina
    RISK MANAGEMENT-AN INTERNATIONAL JOURNAL, 2023, 25 (04):