Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets With Class Overlapping

被引:17
|
作者
Liu, Changhui [1 ]
Jin, Sun [2 ]
Wang, Donghong [3 ]
Luo, Zichao [4 ]
Yu, Jianbo [1 ]
Zhou, Binghai [1 ]
Yang, Changlin [5 ]
机构
[1] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, Sch Mat Sci & Engn, Shanghai 200240, Peoples R China
[4] Tokyo Inst Technol, Yoshino & Yamamoto Lab, Tokyo 1528550, Japan
[5] Northwestern Polytech Univ, State Key Lab Solidificat Proc, Xian 710000, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Classification algorithms; Proposals; Sampling methods; Ant colony optimization; Data models; Prediction algorithms; Data structures; Constrained oversampling; oversampling; class overlapping; imbalanced dataset; ANT COLONY OPTIMIZATION; DATA-SETS; CLASSIFICATION; SMOTE; REGRESSION;
D O I
10.1109/ACCESS.2020.3018911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced datasets are pervasive in classification tasks and would cause degradation of the performance of classifiers in predicting minority samples. Oversampling is effective in resolving the class imbalance problem. However, existing oversampling methods generally introduce noise examples into original datasets, especially when the datasets contain class overlapping regions. In this study, a new oversampling method named Constrained Oversampling is proposed to reduce noise generation in oversampling. This algorithm first extracts overlapping regions in the dataset. Then Ant Colony Optimization is applied to define the boundaries of minority regions. Third, oversampling under constraints is employed to synthesize new samples to get a balanced dataset. Our proposal distinguishes itself from other techniques by incorporating constraints in the oversampling process to inhibit noise generation. Experiments show that it outperforms various benchmark oversampling approaches. The explanation for the effectiveness of our method is given by studying the impact of class overlapping on imbalanced learning.
引用
收藏
页码:91452 / 91465
页数:14
相关论文
共 50 条
  • [31] A new oversampling approach based differential evolution on the safe set for highly imbalanced datasets
    Zhang, Jiaoni
    Li, Yanying
    Zhang, Baoshuang
    Wang, Xialin
    Gong, Huanhuan
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [32] Noise-robust oversampling for imbalanced data classification
    Liu, Yongxu
    Liu, Yan
    Yu, Bruce X. B.
    Zhong, Shenghua
    Hu, Zhejing
    PATTERN RECOGNITION, 2023, 133
  • [33] Oversampling for Mining Imbalanced Datasets: Taxonomy and Performance Evaluation
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 : 322 - 333
  • [34] Evidential Undersampling Approach for Imbalanced Datasets with Class-Overlapping and Noise
    Grina, Fares
    Elouedi, Zied
    Lefevre, Eric
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE (MDAI 2021), 2021, 12898 : 181 - 192
  • [35] ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning
    Ibrahim, Mohammed H.
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (22) : 15781 - 15806
  • [36] Density-induced oversampling for highly imbalanced datasets
    Fecker, Daniel
    Maergner, Volker
    Fingscheidt, Tim
    IMAGE PROCESSING: MACHINE VISION APPLICATIONS VI, 2013, 8661
  • [37] A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
    Cao, Jie
    Shi, Yong
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2021, 28 (06): : 1813 - 1819
  • [38] Global-local information based oversampling for multi-class imbalanced data
    Han, Mingming
    Guo, Husheng
    Li, Jinyan
    Wang, Wenjian
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (06) : 2071 - 2086
  • [39] TGT: A Novel Adversarial Guided Oversampling Technique for Handling Imbalanced Datasets
    Mahmoud, Ayat
    El-Kilany, Ayman
    Ali, Farid
    Mazen, Sherif
    EGYPTIAN INFORMATICS JOURNAL, 2021, 22 (04) : 433 - 438
  • [40] An Oversampling Method for Class Imbalance Problems on Large Datasets
    Rodriguez-Torres, Fredy
    Martinez-Trinidad, Jose F.
    Carrasco-Ochoa, Jesus A.
    APPLIED SCIENCES-BASEL, 2022, 12 (07):