Binary imbalanced data classification based on diversity oversampling by generative models

被引:32
|
作者
Zhai, Junhai [1 ]
Qi, Jiaxing [1 ]
Shen, Chu [1 ]
机构
[1] Hebei Univ, Coll Math & Informat Sci, Hebei Key Lab Machine Learning & Computat Intelli, Baoding 071002, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced learning; Binary imbalanced data classification; Diversity oversampling; Generative adversarial network; Extreme learning machine autoencoder; EXTREME LEARNING-MACHINE; SMOTE; ENSEMBLE; CLASSIFIERS;
D O I
10.1016/j.ins.2021.11.058
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In many practical applications, the data are class imbalanced. Accordingly, it is very meaningful and valuable to investigate the classification of imbalanced data. In the framework of binary imbalanced data classification, the synthetic minority oversampling technique (SMOTE) is the best-known oversampling method. However, for each positive sample, SMOTE generates only k synthetic samples on the lines between the positive sam-ple and its k-nearest neighbors, resulting in three drawbacks: (1) SMOTE cannot effectively extend the training field of positive samples; (2) the generated positive samples lack diver-sity; (3) SMOTE does not accurately approximate the probability distribution of the posi-tive samples. Therefore, two binary imbalanced data classification methods named BIDC1 and BIDC2 based on diversity oversampling by generative models are proposed. The BIDC1 and BIDC2 conduct diversity oversampling using extreme learning machine autoencoder and generative adversarial network, respectively. Extensive experiments on 26 data sets are conducted to compare the two methods with 14 state-of-the-art methods using five metrics: F-measure, G-means, AUC-area, MMD-score, and Silhouette-score. The experimental results demonstrate that the two methods outperform the other 14 methods. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:313 / 343
页数:31
相关论文
共 50 条
  • [1] On oversampling imbalanced data with deep conditional generative models
    Fajardo, Val Andrei
    Findlay, David
    Jaiswal, Charu
    Yin, Xinshang
    Houmanfar, Roshanak
    Xie, Honglei
    Liang, Jiaxi
    She, Xichen
    Emerson, D. B.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169 (169)
  • [2] Conditional Data Synthesis with Deep Generative Models for Imbalanced Dataset Oversampling
    Akritidis, Leonidas
    Fevgas, Athanasios
    Alamaniotis, Miltiadis
    Bozanis, Panayiotis
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 444 - 451
  • [3] Gaussian Distribution Based Oversampling for Imbalanced Data Classification
    Xie, Yuxi
    Qiu, Min
    Zhang, Haibo
    Peng, Lizhi
    Chen, Zhenxiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 667 - 679
  • [4] Oversampling Highly Imbalanced Indoor Positioning Data using Deep Generative Models
    Alhomayani, Fahad
    Mahoor, Mohammad H.
    2021 IEEE SENSORS, 2021,
  • [5] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
  • [6] Radial-Based oversampling for noisy imbalanced data classification
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    NEUROCOMPUTING, 2019, 343 : 19 - 33
  • [7] Radial-Based Oversampling for Multiclass Imbalanced Data Classification
    Krawczyk, Bartosz
    Koziarski, Michal
    Wozniak, Michal
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) : 2818 - 2831
  • [8] Binary Imbalanced Data Classification Based on Modified D2GAN Oversampling and Classifier Fusion
    Zhai, Junhai
    Qi, Jiaxing
    Zhang, Sufang
    IEEE ACCESS, 2020, 8 (169456-169469) : 169456 - 169469
  • [9] Deep generative approaches for oversampling in imbalanced data classification problems: A comprehensive review and comparative analysis
    Shirvan, Mozafar Hayaeian
    Moattar, Mohammad Hossein
    Hosseinzadeh, Mehdi
    APPLIED SOFT COMPUTING, 2025, 170
  • [10] A NOVEL RULE-BASED OVERSAMPLING APPROACH FOR IMBALANCED DATA CLASSIFICATION
    Zhang, Xiao
    Paz, Ivan
    Nebot, Angela
    37TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE 2023, ESM 2023, 2023, : 208 - 212