Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets

被引:193
|
作者
Nekooeimehr, Iman [1 ]
Lai-Yuen, Susana K. [1 ]
机构
[1] Univ S Florida, Ind & Management Syst Engn, Tampa, FL 33620 USA
关键词
Imbalanced dataset; Classification; Clustering; Oversampling; PERFORMANCE;
D O I
10.1016/j.eswa.2015.10.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to one of the classes (majority class), while only a few instances are from the other class (minority class). Conventional classifiers will strongly favor the majority class and ignore the minority instances. In this paper, we present a new oversampling method called Adaptive Semi-Unsupervised Weighted Oversampling (A-SUWO) for imbalanced binary dataset classification. The proposed method clusters the minority instances using a semi-unsupervised hierarchical clustering approach and adaptively determines the size to oversample each sub-cluster using its classification complexity and cross validation. Then, the minority instances are oversampled depending on their Euclidean distance to the majority class. A-SUWO aims to identify hard-to-learn instances by considering minority instances from each sub-cluster that are closer to the borderline. It also avoids generating synthetic minority instances that overlap with the majority class by considering the majority class in the clustering and oversampling stages. Results demonstrate that the proposed method achieves significantly better results in most datasets compared with other sampling methods. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:405 / 416
页数:12
相关论文
共 13 条
  • [1] IA-SUWO: An Improving Adaptive semi-unsupervised weighted oversampling for imbalanced classification problems
    Wei Jianan
    Huang Haisong
    Yao Liguo
    Hu Yao
    Fan Qingsong
    Huang Dong
    KNOWLEDGE-BASED SYSTEMS, 2020, 203
  • [2] Improved Adaptive Semi-Unsupervised Weighted Oversampling using Sparsity Factor for Imbalanced Datasets
    Ali, Haseeb
    Salleh, Mohd Najib Mohd
    Hussain, Kashif
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (11) : 372 - 383
  • [3] An Adaptive Oversampling Technique for Imbalanced Datasets
    Shahee, Shaukat Ali
    Ananthakumar, Usha
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 1 - 16
  • [4] A novel adaptive boundary weighted and synthetic minority oversampling algorithm for imbalanced datasets
    Song, Xudong
    Chen, Yilin
    Liang, Pan
    Wan, Xiaohui
    Cui, Yunxian
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 3245 - 3259
  • [5] AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets
    Guan, Shaopeng
    Zhao, Xiaoyan
    Xue, Yuewei
    Pan, Hao
    INFORMATION SCIENCES, 2024, 663
  • [6] Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification
    Tao, Xinmin
    Guo, Xinyue
    Zheng, Yujia
    Zhang, Xiaohan
    Chen, Zhiyu
    KNOWLEDGE-BASED SYSTEMS, 2023, 277
  • [7] An Adaptive and Robust Method for Oriented Oversampling With Spatial Information for Imbalanced Noisy Datasets
    Deng, Yi
    Li, Mingyong
    IEEE ACCESS, 2023, 11 : 122610 - 122624
  • [8] Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering
    Tao, Xinmin
    Li, Qing
    Guo, Wenjie
    Ren, Chao
    He, Qing
    Liu, Rui
    Zou, JunRong
    INFORMATION SCIENCES, 2020, 519 : 43 - 73
  • [9] Development of a Neighborhood Based Adaptive Heterogeneous Oversampling Ensemble Classifier for Imbalanced Binary Class Datasets
    Subbulaxmi, S. Santha
    Arumugam, G.
    PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2022, 2023, 475 : 353 - 361
  • [10] NCLWO: Newton's cooling law-based weighted oversampling algorithm for imbalanced datasets with feature noise
    Tao, Liangliang
    Wang, Qingya
    Zhu, Zhicheng
    Yu, Fen
    Yin, Xia
    NEUROCOMPUTING, 2024, 610