A Hybrid GAN-Based Approach to Solve Imbalanced Data Problem in Recommendation Systems

被引:20
|
作者
Shafqat, Wafa [1 ]
Byun, Yung-Cheol [1 ]
机构
[1] Jeju Natl Univ, Dept Comp Engn, Jeju 63243, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Generative adversarial networks; Data models; Training; Data mining; IP networks; Generators; Numerical models; GAN; imbalanced data; oversampling; synthetic data; recommendation systems; condition GAN; WGAN-GP; PacGAN;
D O I
10.1109/ACCESS.2022.3141776
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of information technology, the amount of online data generation has been massive. Recommendation systems have become an effective tool in filtering information and solving the problem of information overload. Machine learning algorithms to build these recommendation systems require well-balanced data in terms of class distribution, but real-world datasets are mostly imbalanced in nature. Imbalanced data imposes a classifier to focus more on the majority class, neglecting other classes of interests and thus hindering the predictive performance of any classification model. There exist many traditional techniques for oversampling minority classes. Still, generative adversarial networks (GAN) have been showing excellent results in generating realistic synthetic tabular data that keeps the probability distribution of the original data intact. In this paper, we propose a hybrid GAN approach to solve the data imbalance problem to enhance recommendation systems' performance. We implemented conditional Wasserstein GAN with gradient penalty to generate tabular data containing both numerical and categorical values. We also augmented auxiliary classifier loss to enforce the model to explicitly generate data belonging to the minority class. We designed the discriminator architecture with the concept of PacGAN to receive m-packed samples as input instead of a single input. This inclusion of the PacGAN architecture eliminated the mode collapse problem in our proposed model. We did a two-fold evaluation of our model. Firstly based on the quality of the generated data and secondly on how different recommendation models perform using the generated data compared to original data.
引用
收藏
页码:11036 / 11047
页数:12
相关论文
共 50 条
  • [1] A Hybrid GAN-Based DL Approach for the Automatic Detection of Shockable Rhythms in AED for Solving Imbalanced Data Problems
    Dahal, Kamana
    Ali, Mohd. Hasan
    ELECTRONICS, 2023, 12 (01)
  • [2] GAN-based imbalanced data intrusion detection system
    JooHwa Lee
    KeeHyun Park
    Personal and Ubiquitous Computing, 2021, 25 : 121 - 128
  • [3] GAN-based imbalanced data intrusion detection system
    Lee, JooHwa
    Park, KeeHyun
    PERSONAL AND UBIQUITOUS COMPUTING, 2021, 25 (01) : 121 - 128
  • [4] GAN-Based Semi-supervised For Imbalanced Data Classification
    Zhou, Tingting
    Liu, Wei
    Zhou, Congyu
    Chen, Leiting
    2018 4TH INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT (ICIM2018), 2018, : 17 - 21
  • [5] FTGAN: A Novel GAN-Based Data Augmentation Method Coupled Time-Frequency Domain for Imbalanced Bearing Fault Diagnosis
    Wang, Haoyu
    Li, Peng
    Lang, Xun
    Tao, Dapeng
    Ma, Jun
    Li, Xiang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [6] LEGAN: Addressing Intraclass Imbalance in GAN-Based Medical Image Augmentation for Improved Imbalanced Data Classification
    Ding, Hongwei
    Huang, Nana
    Wu, Yaoxin
    Cui, Xiaohui
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 14
  • [7] FTGAN: A Novel GAN-Based Data Augmentation Method Coupled Time-Frequency Domain for Imbalanced Bearing Fault Diagnosis
    Wang, Haoyu
    Li, Peng
    Lang, Xun
    Tao, Dapeng
    Ma, Jun
    Li, Xiang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [8] Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning
    Engelmann, Justin
    Lessmann, Stefan
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
  • [9] A GAN-Based Data Augmentation Method for Imbalanced Multi-Class Skin Lesion Classification
    Su, Qichen
    Hamed, Haza Nuzly Abdull
    Isa, Mohd Adham
    Hao, Xue
    Dai, Xin
    IEEE ACCESS, 2024, 12 : 16498 - 16513
  • [10] Producing More with Less: A GAN-based Network Attack Detection Approach for Imbalanced Data
    Hao, Xingran
    Jiang, Zhengwei
    Xiao, Qingsai
    Wang, Qiuyun
    Yao, Yepeng
    Liu, Baoxu
    Liu, Jian
    PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 384 - 390