A Hybrid GAN-Based Approach to Solve Imbalanced Data Problem in Recommendation Systems

被引:20
|
作者
Shafqat, Wafa [1 ]
Byun, Yung-Cheol [1 ]
机构
[1] Jeju Natl Univ, Dept Comp Engn, Jeju 63243, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Generative adversarial networks; Data models; Training; Data mining; IP networks; Generators; Numerical models; GAN; imbalanced data; oversampling; synthetic data; recommendation systems; condition GAN; WGAN-GP; PacGAN;
D O I
10.1109/ACCESS.2022.3141776
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of information technology, the amount of online data generation has been massive. Recommendation systems have become an effective tool in filtering information and solving the problem of information overload. Machine learning algorithms to build these recommendation systems require well-balanced data in terms of class distribution, but real-world datasets are mostly imbalanced in nature. Imbalanced data imposes a classifier to focus more on the majority class, neglecting other classes of interests and thus hindering the predictive performance of any classification model. There exist many traditional techniques for oversampling minority classes. Still, generative adversarial networks (GAN) have been showing excellent results in generating realistic synthetic tabular data that keeps the probability distribution of the original data intact. In this paper, we propose a hybrid GAN approach to solve the data imbalance problem to enhance recommendation systems' performance. We implemented conditional Wasserstein GAN with gradient penalty to generate tabular data containing both numerical and categorical values. We also augmented auxiliary classifier loss to enforce the model to explicitly generate data belonging to the minority class. We designed the discriminator architecture with the concept of PacGAN to receive m-packed samples as input instead of a single input. This inclusion of the PacGAN architecture eliminated the mode collapse problem in our proposed model. We did a two-fold evaluation of our model. Firstly based on the quality of the generated data and secondly on how different recommendation models perform using the generated data compared to original data.
引用
收藏
页码:11036 / 11047
页数:12
相关论文
共 50 条
  • [31] Radial-Based Approach to Imbalanced Data Oversampling
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017, 2017, 10334 : 318 - 327
  • [32] CDBH: A clustering and density-based hybrid approach for imbalanced data classification
    Mirzaei, Behzad
    Nikpour, Bahareh
    Nezamabadi-pour, Hossein
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164
  • [33] Studies on the GAN-Based Anomaly Detection Methods for the Time Series Data
    Lee, Chang-Ki
    Cheon, Yu-Jeong
    Hwang, Wook-Yeon
    IEEE ACCESS, 2021, 9 : 73201 - 73215
  • [34] GAN-BASED SYNTHETIC MEDICAL IMAGE AUGMENTATION FOR CLASS IMBALANCED DERMOSCOPIC IMAGE ANALYSIS
    Alshardan, Amal
    Alahmari, Saad
    Alghamdi, Mohammed
    AL Sadig, Mutasim
    Mohamed, Abdullah
    Mohammed, Gouse Pasha
    FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2025,
  • [35] A Knowledge-Enhanced Deep Recommendation Framework Incorporating GAN-based Models
    Yang, Deqing
    Guo, Zikai
    Wang, Ziyi
    Jiang, Junyang
    Xiao, Yanghua
    Wang, Wei
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1368 - 1373
  • [36] Information granulation based data mining approach for classifying imbalanced data
    Chen, Mu-Chen
    Chen, Long-Sheng
    Hsu, Chun-Chin
    Zeng, Wei-Rong
    INFORMATION SCIENCES, 2008, 178 (16) : 3214 - 3227
  • [37] Research Publication Recommendation System based on a Hybrid Approach
    Tsolakidis, Anastasios
    Triperina, Evangelia
    Sgouropoulou, Cleo
    Christidis, Nikos
    20TH PAN-HELLENIC CONFERENCE ON INFORMATICS (PCI 2016), 2016,
  • [38] GAN-based data reconstruction attacks in split learning
    Zeng, Bo
    Luo, Sida
    Yu, Fangchao
    Yang, Geying
    Zhao, Kai
    Wang, Lina
    NEURAL NETWORKS, 2025, 185
  • [39] GAN-based synthetic time-series data generation for improving prediction of demand for electric vehicles
    Chatterjee, Subhajit
    Hazra, Debapriya
    Byun, Yung-Cheol
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 264
  • [40] Imbalanced Data Classification Based on Hybrid Methods
    Zhang, Nai-Nan
    Ye, Shao-Zhen
    Chien, Ting-Ying
    PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA RESEARCH (ICBDR 2018), 2018, : 16 - 20