A Hybrid GAN-Based Approach to Solve Imbalanced Data Problem in Recommendation Systems

被引:20
|
作者
Shafqat, Wafa [1 ]
Byun, Yung-Cheol [1 ]
机构
[1] Jeju Natl Univ, Dept Comp Engn, Jeju 63243, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Generative adversarial networks; Data models; Training; Data mining; IP networks; Generators; Numerical models; GAN; imbalanced data; oversampling; synthetic data; recommendation systems; condition GAN; WGAN-GP; PacGAN;
D O I
10.1109/ACCESS.2022.3141776
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of information technology, the amount of online data generation has been massive. Recommendation systems have become an effective tool in filtering information and solving the problem of information overload. Machine learning algorithms to build these recommendation systems require well-balanced data in terms of class distribution, but real-world datasets are mostly imbalanced in nature. Imbalanced data imposes a classifier to focus more on the majority class, neglecting other classes of interests and thus hindering the predictive performance of any classification model. There exist many traditional techniques for oversampling minority classes. Still, generative adversarial networks (GAN) have been showing excellent results in generating realistic synthetic tabular data that keeps the probability distribution of the original data intact. In this paper, we propose a hybrid GAN approach to solve the data imbalance problem to enhance recommendation systems' performance. We implemented conditional Wasserstein GAN with gradient penalty to generate tabular data containing both numerical and categorical values. We also augmented auxiliary classifier loss to enforce the model to explicitly generate data belonging to the minority class. We designed the discriminator architecture with the concept of PacGAN to receive m-packed samples as input instead of a single input. This inclusion of the PacGAN architecture eliminated the mode collapse problem in our proposed model. We did a two-fold evaluation of our model. Firstly based on the quality of the generated data and secondly on how different recommendation models perform using the generated data compared to original data.
引用
收藏
页码:11036 / 11047
页数:12
相关论文
共 50 条
  • [41] Privacy preservation for image data: A GAN-based method
    Chen, Zhenfei
    Zhu, Tianqing
    Xiong, Ping
    Wang, Chenguang
    Ren, Wei
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (04) : 1668 - 1685
  • [42] GAN-based one dimensional medical data augmentation
    Ye Zhang
    Zhixiang Wang
    Zhen Zhang
    Junzhuo Liu
    Ying Feng
    Leonard Wee
    Andre Dekker
    Qiaosong Chen
    Alberto Traverso
    Soft Computing, 2023, 27 : 10481 - 10491
  • [43] GAN-based one dimensional medical data augmentation
    Zhang, Ye
    Wang, Zhixiang
    Zhang, Zhen
    Liu, Junzhuo
    Feng, Ying
    Wee, Leonard
    Dekker, Andre
    Chen, Qiaosong
    Traverso, Alberto
    SOFT COMPUTING, 2023, 27 (15) : 10481 - 10491
  • [44] GAN-based Data Generation for Speech Emotion Recognition
    Eskimez, Sefik Emre
    Dimitriadis, Dimitrios
    Gmyr, Robert
    Kumanati, Kenichi
    INTERSPEECH 2020, 2020, : 3446 - 3450
  • [45] GAN-Based Temporal Association Rule Mining on Multivariate Time Series Data
    He, Guoliang
    Dai, Lifang
    Yu, Zhiwen
    Chen, C. L. Philip
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (10) : 5168 - 5180
  • [46] Optimized automated cardiac MR scar quantification with GAN-based data augmentation
    Lustermans, Didier R. P. R. M.
    Amirrajab, Sina
    Veta, Mitko
    Breeuwer, Marcel
    Scannell, Cian M.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 226
  • [47] A Hybrid Approach With GAN and DP for Privacy Preservation of IIoT Data
    Hindistan, Yavuz Selim
    Yetkin, E. Fatih
    IEEE ACCESS, 2023, 11 : 5837 - 5849
  • [48] GAN-Based Controllable Image Data Augmentation in Low-Visibility Conditions for Improved Roadside Traffic Perception
    Li, Kong
    Dai, Zhe
    Wang, Xuan
    Song, Yongchao
    Jeon, Gwanggil
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (03) : 6174 - 6188
  • [49] DRCGR: Deep reinforcement learning framework incorporating CNN and GAN-based for interactive recommendation
    Gao, Rong
    Xia, Haifeng
    Li, Jing
    Liu, Donghua
    Chen, Shuai
    Chun, Gang
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1048 - 1053
  • [50] SYN-GAN: A robust intrusion detection system using GAN-based synthetic data for IoT security
    Rahman, Saifur
    Pal, Shantanu
    Mittal, Shubh
    Chawla, Tisha
    Karmakar, Chandan
    INTERNET OF THINGS, 2024, 26