A Hybrid GAN-Based Approach to Solve Imbalanced Data Problem in Recommendation Systems

被引:20
|
作者
Shafqat, Wafa [1 ]
Byun, Yung-Cheol [1 ]
机构
[1] Jeju Natl Univ, Dept Comp Engn, Jeju 63243, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Generative adversarial networks; Data models; Training; Data mining; IP networks; Generators; Numerical models; GAN; imbalanced data; oversampling; synthetic data; recommendation systems; condition GAN; WGAN-GP; PacGAN;
D O I
10.1109/ACCESS.2022.3141776
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of information technology, the amount of online data generation has been massive. Recommendation systems have become an effective tool in filtering information and solving the problem of information overload. Machine learning algorithms to build these recommendation systems require well-balanced data in terms of class distribution, but real-world datasets are mostly imbalanced in nature. Imbalanced data imposes a classifier to focus more on the majority class, neglecting other classes of interests and thus hindering the predictive performance of any classification model. There exist many traditional techniques for oversampling minority classes. Still, generative adversarial networks (GAN) have been showing excellent results in generating realistic synthetic tabular data that keeps the probability distribution of the original data intact. In this paper, we propose a hybrid GAN approach to solve the data imbalance problem to enhance recommendation systems' performance. We implemented conditional Wasserstein GAN with gradient penalty to generate tabular data containing both numerical and categorical values. We also augmented auxiliary classifier loss to enforce the model to explicitly generate data belonging to the minority class. We designed the discriminator architecture with the concept of PacGAN to receive m-packed samples as input instead of a single input. This inclusion of the PacGAN architecture eliminated the mode collapse problem in our proposed model. We did a two-fold evaluation of our model. Firstly based on the quality of the generated data and secondly on how different recommendation models perform using the generated data compared to original data.
引用
收藏
页码:11036 / 11047
页数:12
相关论文
共 50 条
  • [11] Antenna Design Using a GAN-Based Synthetic Data Generation Approach
    Noakoasteen, Oameed
    Vijayamohanan, Jayakrishnan
    Gupta, Arjun
    Christodoulou, Christos
    IEEE OPEN JOURNAL OF ANTENNAS AND PROPAGATION, 2022, 3 : 488 - 494
  • [12] Improving imbalanced medical image classification through GAN-based data augmentation methods
    Ding, Hongwei
    Huang, Nana
    Wu, Yaoxin
    Cui, Xiaohui
    PATTERN RECOGNITION, 2025, 166
  • [13] Tabular GAN-Based Oversampling of Imbalanced Time-to-Event Data for Survival Prediction
    Tan, Huaning
    Chen, Renxing
    Qin, Meng
    Tang, Lining
    Wu, Zhibing
    Luo, Qianlin
    Quan, Yujuan
    2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 376 - 380
  • [14] A Novel Approach for Intelligent Fault Diagnosis in Bearing With Imbalanced Data Based on Cycle-Consistent GAN
    Liao, Wenjie
    Wu, Like
    Xu, Shihui
    Fujimura, Shigeru
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [15] A GAN-Based Anomaly Detection Approach for Imbalanced Industrial Time Series
    Jiang, Wenqian
    Hong, Yang
    Zhou, Beitong
    He, Xin
    Cheng, Cheng
    IEEE ACCESS, 2019, 7 : 143608 - 143619
  • [16] RGAN-EL: A GAN and ensemble learning-based hybrid approach for imbalanced data classification
    Ding, Hongwei
    Sun, Yu
    Wang, Zhenyu
    Huang, Nana
    Shen, Zhidong
    Cui, Xiaohui
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [17] A GAN-Based Data Injection Attack Method on Data-Driven Strategies in Power Systems
    Liu, Zengji
    Wang, Qi
    Ye, Yujian
    Tang, Yi
    IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (04) : 3203 - 3213
  • [18] GAN-Based Synthetic Data Augmentation for Infrared Small Target Detection
    Kim, Jun-Hyung
    Hwang, Youngbae
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [19] CTGAN-ENN: a tabular GAN-based hybrid sampling method for imbalanced and overlapped data in customer churn prediction
    Adiputra, I. Nyoman Mahayasa
    Wanchai, Paweena
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [20] A Survey on GAN-Based Data Augmentation for Hand Pose Estimation Problem
    Farahanipad, Farnaz
    Rezaei, Mohammad
    Nasr, Mohammad Sadegh
    Kamangar, Farhad
    Athitsos, Vassilis
    TECHNOLOGIES, 2022, 10 (02)