Generative Adversarial Networks for Synthetic Data Generation in Finance: Evaluating Statistical Similarities and Quality Assessment

被引:5
|
作者
Ramzan, Faisal [1 ]
Sartori, Claudio [2 ]
Consoli, Sergio [3 ]
Recupero, Diego Reforgiato [1 ]
机构
[1] Univ Cagliari, Dept Math & Comp Sci, I-09124 Cagliari, Italy
[2] Univ Bologna, Dept Comp Sci & Engn, I-40126 Bologna, Italy
[3] European Commiss, Joint Res Ctr DG JRC, Brussels, Belgium
关键词
generative adversarial networks; deep learning; data augmentation; synthetic data; BIG DATA;
D O I
10.3390/ai5020035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating synthetic data is a complex task that necessitates accurately replicating the statistical and mathematical properties of the original data elements. In sectors such as finance, utilizing and disseminating real data for research or model development can pose substantial privacy risks owing to the inclusion of sensitive information. Additionally, authentic data may be scarce, particularly in specialized domains where acquiring ample, varied, and high-quality data is difficult or costly. This scarcity or limited data availability can limit the training and testing of machine-learning models. In this paper, we address this challenge. In particular, our task is to synthesize a dataset with similar properties to an input dataset about the stock market. The input dataset is anonymized and consists of very few columns and rows, contains many inconsistencies, such as missing rows and duplicates, and its values are not normalized, scaled, or balanced. We explore the utilization of generative adversarial networks, a deep-learning technique, to generate synthetic data and evaluate its quality compared to the input stock dataset. Our innovation involves generating artificial datasets that mimic the statistical properties of the input elements without revealing complete information. For example, synthetic datasets can capture the distribution of stock prices, trading volumes, and market trends observed in the original dataset. The generated datasets cover a wider range of scenarios and variations, enabling researchers and practitioners to explore different market conditions and investment strategies. This diversity can enhance the robustness and generalization of machine-learning models. We evaluate our synthetic data in terms of the mean, similarities, and correlations.
引用
收藏
页码:667 / 685
页数:19
相关论文
共 50 条
  • [1] Generation of Synthetic Data with Conditional Generative Adversarial Networks
    Vega-Marquez, Belen
    Rubio-Escudero, Cristina
    Nepomuceno-Chamorro, Isabel
    LOGIC JOURNAL OF THE IGPL, 2022, 30 (02) : 252 - 262
  • [2] Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks
    Nik, Alireza Hossein Zadeh
    Riegler, Michael A.
    Halvorsen, Pal
    Storas, Andrea M.
    MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 434 - 446
  • [3] Supporting Database Constraints in Synthetic Data Generation based on Generative Adversarial Networks
    Li, Wanxin
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 2875 - 2877
  • [4] Synthetic Traffic Generation with Wasserstein Generative Adversarial Networks
    Wu, Chao-Lun
    Chen, Yu-Ying
    Chou, Po-Yu
    Wang, Chih-Yu
    2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 1503 - 1508
  • [5] Creation of Synthetic Data with Conditional Generative Adversarial Networks
    Vega-Marquez, Belen
    Rubio-Escudero, Cristina
    Riquelme, Jose C.
    Nepomuceno-Chamorro, Isabel
    14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019), 2020, 950 : 231 - 240
  • [6] Synthetic demand data generation for individual electricity consumers : Generative Adversarial Networks (GANs)
    Yilmaz, Bilgi
    Korn, Ralf
    ENERGY AND AI, 2022, 9
  • [7] Synthetic Fingerprint Generation Using Generative Adversarial Networks: A Review
    Dhaneshwar, Ritika
    Taya, Arnav
    Kaur, Mandeep
    FOURTH CONGRESS ON INTELLIGENT SYSTEMS, VOL 1, CIS 2023, 2024, 868 : 375 - 387
  • [8] Generative Adversarial Networks applied to synthetic financial scenarios generation
    Rizzato, Matteo
    Wallart, Julien
    Geissler, Christophe
    Morizet, Nicolas
    Boumlaik, Noureddine
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2023, 623
  • [9] Synthetic Dataset Generation for Text Recognition with Generative Adversarial Networks
    Efimova, Valeria
    Shalamov, Viacheslav
    Filchenkov, Andrey
    TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
  • [10] Synthetic Behavior Sequence Generation Using Generative Adversarial Networks
    Akbari F.
    Sartipi K.
    Archer N.
    ACM Transactions on Computing for Healthcare, 2023, 4 (01):