A Hybrid GAN-Based Approach to Solve Imbalanced Data Problem in Recommendation Systems

被引:20
|
作者
Shafqat, Wafa [1 ]
Byun, Yung-Cheol [1 ]
机构
[1] Jeju Natl Univ, Dept Comp Engn, Jeju 63243, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Generative adversarial networks; Data models; Training; Data mining; IP networks; Generators; Numerical models; GAN; imbalanced data; oversampling; synthetic data; recommendation systems; condition GAN; WGAN-GP; PacGAN;
D O I
10.1109/ACCESS.2022.3141776
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of information technology, the amount of online data generation has been massive. Recommendation systems have become an effective tool in filtering information and solving the problem of information overload. Machine learning algorithms to build these recommendation systems require well-balanced data in terms of class distribution, but real-world datasets are mostly imbalanced in nature. Imbalanced data imposes a classifier to focus more on the majority class, neglecting other classes of interests and thus hindering the predictive performance of any classification model. There exist many traditional techniques for oversampling minority classes. Still, generative adversarial networks (GAN) have been showing excellent results in generating realistic synthetic tabular data that keeps the probability distribution of the original data intact. In this paper, we propose a hybrid GAN approach to solve the data imbalance problem to enhance recommendation systems' performance. We implemented conditional Wasserstein GAN with gradient penalty to generate tabular data containing both numerical and categorical values. We also augmented auxiliary classifier loss to enforce the model to explicitly generate data belonging to the minority class. We designed the discriminator architecture with the concept of PacGAN to receive m-packed samples as input instead of a single input. This inclusion of the PacGAN architecture eliminated the mode collapse problem in our proposed model. We did a two-fold evaluation of our model. Firstly based on the quality of the generated data and secondly on how different recommendation models perform using the generated data compared to original data.
引用
收藏
页码:11036 / 11047
页数:12
相关论文
共 50 条
  • [21] Semi-GAN: An Improved GAN-Based Missing Data Imputation Method for the Semiconductor Industry
    Lee, Sun-Yong
    Connerton, Timothy Paul
    Lee, Yeon-Woo
    Kim, Daeyoung
    Kim, Donghwan
    Kim, Jin-Ho
    IEEE ACCESS, 2022, 10 : 72328 - 72338
  • [22] Binary Imbalanced Data Classification Based on Modified D2GAN Oversampling and Classifier Fusion
    Zhai, Junhai
    Qi, Jiaxing
    Zhang, Sufang
    IEEE ACCESS, 2020, 8 (169456-169469) : 169456 - 169469
  • [23] ACWGAN: AN AUXILIARY CLASSIFIER WASSERSTEIN GAN-BASED OVERSAMPLING APPROACH FOR MULTI-CLASS IMBALANCED LEARNING
    Liao, Chen
    Dong, Minggang
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2022, 18 (03): : 703 - 721
  • [24] Interpretable Data-Driven Approach Based on Feature Selection Methods and GAN-Based Models for Cardiovascular Risk Prediction in Diabetic Patients
    Chushig-Muzo, David
    Calero-Diaz, Hugo
    Lara-Abelenda, Francisco J.
    Gomez-Martinez, Vanesa
    Granja, Conceicao
    Soguero-Ruiz, Cristina
    IEEE ACCESS, 2024, 12 : 84292 - 84305
  • [25] A clustering and generative adversarial networks-based hybrid approach for imbalanced data classification
    Ding H.
    Cui X.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (06) : 8003 - 8018
  • [26] An Improved D2GAN-based oversampling algorithm for imbalanced data classification
    Zhao, Xiaoqiang
    Yao, Qinglei
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (06) : 569 - 582
  • [27] Enhancing human action recognition with GAN-based data augmentation
    Pulakurthi, Prasanna Reddy
    de Melo, Celso M.
    Rao, Raghuveer
    Rabbani, Majid
    SYNTHETIC DATA FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING: TOOLS, TECHNIQUES, AND APPLICATIONS II, 2024, 13035
  • [28] A Hybrid Approach for Binary Classification of Imbalanced Data
    Tsai, Hsinhan
    Yang, Ta-Wei
    Wong, Wai-Man
    Kao, Han-Yi
    Chou, Cheng-Fu
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024, 23 (03)
  • [29] GAN-based Matrix Factorization for Recommender Systems
    Dervishaj, Ervin
    Cremonesi, Paolo
    37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 1373 - 1381
  • [30] Toward Efficiently Evaluating the Robustness of Deep Neural Networks in IoT Systems: A GAN-Based Method
    Bai, Tao
    Zhao, Jun
    Zhu, Jinlin
    Han, Shoudong
    Chen, Jiefeng
    Li, Bo
    Kot, Alex
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (03) : 1875 - 1884