Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets With Class Overlapping

被引:17
|
作者
Liu, Changhui [1 ]
Jin, Sun [2 ]
Wang, Donghong [3 ]
Luo, Zichao [4 ]
Yu, Jianbo [1 ]
Zhou, Binghai [1 ]
Yang, Changlin [5 ]
机构
[1] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, Sch Mat Sci & Engn, Shanghai 200240, Peoples R China
[4] Tokyo Inst Technol, Yoshino & Yamamoto Lab, Tokyo 1528550, Japan
[5] Northwestern Polytech Univ, State Key Lab Solidificat Proc, Xian 710000, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Classification algorithms; Proposals; Sampling methods; Ant colony optimization; Data models; Prediction algorithms; Data structures; Constrained oversampling; oversampling; class overlapping; imbalanced dataset; ANT COLONY OPTIMIZATION; DATA-SETS; CLASSIFICATION; SMOTE; REGRESSION;
D O I
10.1109/ACCESS.2020.3018911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced datasets are pervasive in classification tasks and would cause degradation of the performance of classifiers in predicting minority samples. Oversampling is effective in resolving the class imbalance problem. However, existing oversampling methods generally introduce noise examples into original datasets, especially when the datasets contain class overlapping regions. In this study, a new oversampling method named Constrained Oversampling is proposed to reduce noise generation in oversampling. This algorithm first extracts overlapping regions in the dataset. Then Ant Colony Optimization is applied to define the boundaries of minority regions. Third, oversampling under constraints is employed to synthesize new samples to get a balanced dataset. Our proposal distinguishes itself from other techniques by incorporating constraints in the oversampling process to inhibit noise generation. Experiments show that it outperforms various benchmark oversampling approaches. The explanation for the effectiveness of our method is given by studying the impact of class overlapping on imbalanced learning.
引用
收藏
页码:91452 / 91465
页数:14
相关论文
共 50 条
  • [1] An Adaptive Oversampling Technique for Imbalanced Datasets
    Shahee, Shaukat Ali
    Ananthakumar, Usha
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 1 - 16
  • [2] LoRAS: an oversampling approach for imbalanced datasets
    Bej, Saptarshi
    Davtyan, Narek
    Wolfien, Markus
    Nassar, Mariam
    Wolkenhauer, Olaf
    MACHINE LEARNING, 2021, 110 (02) : 279 - 301
  • [3] Triplets Oversampling for Class Imbalanced Federated Datasets
    Xiao, Chenguang
    Wang, Shuo
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT II, 2023, 14170 : 368 - 383
  • [4] An oversampling algorithm for high-dimensional imbalanced learning with class overlapping
    Yang, Xu
    Xue, Zhen
    Zhang, Liangliang
    Wu, Jianzhen
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (02) : 1915 - 1943
  • [5] LoRAS: an oversampling approach for imbalanced datasets
    Saptarshi Bej
    Narek Davtyan
    Markus Wolfien
    Mariam Nassar
    Olaf Wolkenhauer
    Machine Learning, 2021, 110 : 279 - 301
  • [6] Overlap to equilibrium: Oversampling imbalanced datasets using overlapping degree
    Jubair, Sidra
    Yang, Jie
    Ali, Bilal
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (02)
  • [7] A Synthetic Minority Based on Probabilistic Distribution (SyMProD) Oversampling for Imbalanced Datasets
    Kunakorntum, Intouch
    Hinthong, Woranich
    Phunchongharn, Phond
    IEEE ACCESS, 2020, 8 : 114692 - 114704
  • [8] KNNOR: An oversampling technique for imbalanced datasets
    Islam, Ashhadul
    Belhaouari, Samir Brahim
    Rehman, Atiq Ur
    Bensmail, Halima
    APPLIED SOFT COMPUTING, 2022, 115
  • [9] DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets
    Kaya, Ersin
    Korkmaz, Sedat
    Sahman, Mehmet Akif
    Cinar, Ahmet Cevahir
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169
  • [10] AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets
    Guan, Shaopeng
    Zhao, Xiaoyan
    Xue, Yuewei
    Pan, Hao
    INFORMATION SCIENCES, 2024, 663