Oversampling method based on GAN for tabular binary classification problems

被引:1
作者
Yang, Jie [1 ]
Jiang, Zhenhao [2 ]
Pan, Tingting [1 ]
Chen, Yueqi [1 ]
Pedrycz, Witold [3 ]
机构
[1] Dalian Univ Technol, Sch Math Sci, Dalian, Liaoning, Peoples R China
[2] Chinese Univ Hong Kong Shenzhen, Sch Data Sci, Shenzhen, Guangdong, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Oversampling; GAN; imbalanced learning; IMBALANCED DATASETS; SMOTE; TOOL;
D O I
10.3233/IDA-220383
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data-imbalanced problems are present in many applications. A big gap in the number of samples in different classes induces classifiers to skew to the majority class and thus diminish the performance of learning and quality of obtained results. Most data level imbalanced learning approaches generate new samples only using the information associated with the minority samples through linearly generating or data distribution fitting. Different from these algorithms, we propose a novel oversampling method based on generative adversarial networks (GANs), named OS-GAN. In this method, GAN is assigned to learn the distribution characteristics of the minority class from some selected majority samples but not random noise. As a result, samples released by the trained generator carry information of both majority and minority classes. Furthermore, the central regularization makes the distribution of all synthetic samples not restricted to the domain of the minority class, which can improve the generalization of learning models or algorithms. Experimental results reported on 14 datasets and one high-dimensional dataset show that OS-GAN outperforms 14 commonly used resampling techniques in terms of G-mean, accuracy and F1-score.
引用
收藏
页码:1287 / 1308
页数:22
相关论文
共 59 条
  • [1] KEEL: a software tool to assess evolutionary algorithms for data mining problems
    Alcala-Fdez, J.
    Sanchez, L.
    Garcia, S.
    del Jesus, M. J.
    Ventura, S.
    Garrell, J. M.
    Otero, J.
    Romero, C.
    Bacardit, J.
    Rivas, V. M.
    Fernandez, J. C.
    Herrera, F.
    [J]. SOFT COMPUTING, 2009, 13 (03) : 307 - 318
  • [2] Almutairi W, 2020, CATA, P141, DOI DOI 10.29007/H71Z
  • [3] Arjovsky M, 2017, PR MACH LEARN RES, V70
  • [4] Asuncion A., 2007, UCI machine learning repository
  • [5] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [6] Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
    Douzas, Georgios
    Bacao, Fernando
    Last, Felix
    [J]. INFORMATION SCIENCES, 2018, 465 : 1 - 20
  • [7] Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning
    Douzas, Georgios
    Bacao, Fernando
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 82 : 40 - 52
  • [8] GAN-based Data Generation for Speech Emotion Recognition
    Eskimez, Sefik Emre
    Dimitriadis, Dimitrios
    Gmyr, Robert
    Kumanati, Kenichi
    [J]. INTERSPEECH 2020, 2020, : 3446 - 3450
  • [9] Comparison of Evaluation Metrics in Classification Applications with Imbalanced Datasets
    Fatourechi, Mehrdad
    Ward, Rabab K.
    Mason, Steven G.
    Huggins, Jane
    Schloegl, Alois
    Birch, Gary E.
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 777 - +
  • [10] SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary
    Fernandez, Alberto
    Garcia, Salvador
    Herrera, Francisco
    Chawla, Nitesh V.
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 863 - 905