Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets With Class Overlapping

被引:17
|
作者
Liu, Changhui [1 ]
Jin, Sun [2 ]
Wang, Donghong [3 ]
Luo, Zichao [4 ]
Yu, Jianbo [1 ]
Zhou, Binghai [1 ]
Yang, Changlin [5 ]
机构
[1] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, Sch Mat Sci & Engn, Shanghai 200240, Peoples R China
[4] Tokyo Inst Technol, Yoshino & Yamamoto Lab, Tokyo 1528550, Japan
[5] Northwestern Polytech Univ, State Key Lab Solidificat Proc, Xian 710000, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Classification algorithms; Proposals; Sampling methods; Ant colony optimization; Data models; Prediction algorithms; Data structures; Constrained oversampling; oversampling; class overlapping; imbalanced dataset; ANT COLONY OPTIMIZATION; DATA-SETS; CLASSIFICATION; SMOTE; REGRESSION;
D O I
10.1109/ACCESS.2020.3018911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced datasets are pervasive in classification tasks and would cause degradation of the performance of classifiers in predicting minority samples. Oversampling is effective in resolving the class imbalance problem. However, existing oversampling methods generally introduce noise examples into original datasets, especially when the datasets contain class overlapping regions. In this study, a new oversampling method named Constrained Oversampling is proposed to reduce noise generation in oversampling. This algorithm first extracts overlapping regions in the dataset. Then Ant Colony Optimization is applied to define the boundaries of minority regions. Third, oversampling under constraints is employed to synthesize new samples to get a balanced dataset. Our proposal distinguishes itself from other techniques by incorporating constraints in the oversampling process to inhibit noise generation. Experiments show that it outperforms various benchmark oversampling approaches. The explanation for the effectiveness of our method is given by studying the impact of class overlapping on imbalanced learning.
引用
收藏
页码:91452 / 91465
页数:14
相关论文
共 50 条
  • [41] LSMOTE: A link-based Synthetic Minority Oversampling Technique for binary imbalanced datasets
    Cai, Qin-Nan
    Zhang, Zhong-Liang
    Wu, Yu-Heng
    Zhang, Xiu-Ming
    NEUROCOMPUTING, 2024, 608
  • [42] Oversampling adversarial network for class-imbalanced fault diagnosis
    Zareapoor, Masoumeh
    Shamsolmoali, Pourya
    Yang, Jie
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2021, 149
  • [43] Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification
    Yao, Leehter
    Lin, Tung-Bin
    SENSORS, 2021, 21 (19)
  • [44] NCLWO: Newton's cooling law-based weighted oversampling algorithm for imbalanced datasets with feature noise
    Tao, Liangliang
    Wang, Qingya
    Zhu, Zhicheng
    Yu, Fen
    Yin, Xia
    NEUROCOMPUTING, 2024, 610
  • [45] Development of a Neighborhood Based Adaptive Heterogeneous Oversampling Ensemble Classifier for Imbalanced Binary Class Datasets
    Subbulaxmi, S. Santha
    Arumugam, G.
    PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2022, 2023, 475 : 353 - 361
  • [46] A novel adaptive boundary weighted and synthetic minority oversampling algorithm for imbalanced datasets
    Song, Xudong
    Chen, Yilin
    Liang, Pan
    Wan, Xiaohui
    Cui, Yunxian
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 3245 - 3259
  • [47] Fuzzy-synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets
    Xu, Yanping
    Wu, Chunhua
    Zheng, Kangfeng
    Niu, Xinxin
    Yang, Yixian
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2017, 13 (04):
  • [48] CCO: A Cluster Core-Based Oversampling Technique for Improved Class-Imbalanced Learning
    Mondal, Priyobrata
    Ansari, Faizanuddin
    Das, Swagatam
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 1 - 13
  • [49] An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets
    Kovacs, Gyorgy
    APPLIED SOFT COMPUTING, 2019, 83
  • [50] An Adaptive and Robust Method for Oriented Oversampling With Spatial Information for Imbalanced Noisy Datasets
    Deng, Yi
    Li, Mingyong
    IEEE ACCESS, 2023, 11 : 122610 - 122624