Deterministic Autoencoder using Wasserstein loss for tabular data generation

被引:0
|
作者
Wang, Alex X. [1 ]
Nguyen, Binh P. [1 ,2 ]
机构
[1] Victoria Univ Wellington, Sch Math & Stat, Wellington 6012, New Zealand
[2] Ho Chi Minh City Open Univ, Fac Informat Technol, 97 Vo Van Tan,Dist 3, Ho Chi Minh City 70000, Vietnam
关键词
Deep neural networks; Tabular data synthesis; Latent space interpolation; Generative AI; Wasserstein Autoencoder;
D O I
10.1016/j.neunet.2025.107208
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tabular data generation is a complex task due to its distinctive characteristics and inherent complexities. While Variational Autoencoders have been adapted from the computer vision domain for tabular data synthesis, their reliance on non-deterministic latent space regularization introduces limitations. The stochastic nature of Variational Autoencoders can contribute to collapsed posteriors, yielding suboptimal outcomes and limiting control over the latent space. This characteristic also constrains the exploration of latent space interpolation. To address these challenges, we present the Tabular Wasserstein Autoencoder (TWAE), leveraging the deterministic encoding mechanism of Wasserstein Autoencoders. This characteristic facilitates a deterministic mapping of inputs to latent codes, enhancing the stability and expressiveness of our model's latent space. This, in turn, enables seamless integration with shallow interpolation mechanisms like the synthetic minority over-sampling technique (SMOTE) within the data generation process via deep learning. Specifically, TWAE is trained once to establish a low-dimensional representation of real data, and various latent interpolation methods efficiently generate synthetic latent points, achieving a balance between accuracy and efficiency. Extensive experiments consistently demonstrate TWAE's superiority, showcasing its versatility across diverse feature types and dataset sizes. This innovative approach, combining WAE principles with shallow interpolation, effectively leverages SMOTE's advantages, establishing TWAE as a robust solution for complex tabular data synthesis.
引用
收藏
页数:14
相关论文
共 10 条
  • [1] Correlated Wasserstein Autoencoder for Implicit Data Recommendation
    Yao, Linying
    Zhong, Jingbin
    Zhang, Xiaofeng
    Luo, Linhao
    2020 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2020), 2020, : 417 - 422
  • [2] TTVAE: Transformer-based generative modeling for tabular data generation
    Wang, Alex X.
    Nguyen, Binh P.
    ARTIFICIAL INTELLIGENCE, 2025, 340
  • [3] A Novel Data-to-Text Generation Model with Transformer Planning and a Wasserstein Auto-Encoder
    Xu, Xiaohong
    He, Ting
    Wang, Huazhen
    2020 IEEE 13TH INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2020), 2020, : 337 - 344
  • [4] Big Data Analysis of Ionosphere Disturbances using Deep Autoencoder and Dense Network
    Abri, Rayan
    Artuner, Harun
    Abri, Sara
    Cetin, Salih
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, TECHNOLOGY AND APPLICATIONS (DATA), 2022, : 158 - 167
  • [5] Deep Nonnegative Matrix Factorization Using a Variational Autoencoder With Application to Single-Cell RNA Sequencing Data
    Jee, Dong Jun
    Kong, Yixin
    Chun, Hyonho
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (02) : 883 - 893
  • [6] A Systematic Review of Synthetic Data Generation Techniques Using Generative AI
    Goyal, Mandeep
    Mahmoud, Qusay H.
    ELECTRONICS, 2024, 13 (17)
  • [7] SGAN: Appliance Signatures Data Generation for NILM Applications Using GANs
    Gkoutroumpi, Christina
    Gkalinikis, Nikolaos Virtsionis
    Vrakas, Dimitrios
    INTELLIGENT COMPUTING, VOL 3, 2024, 2024, 1018 : 325 - 339
  • [8] Improved Generation of Synthetic Imaging Data Using Feature-Aligned Diffusion
    Nair, Lakshmi
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON VISION-LANGUAGE MODELS FOR BIOMEDICAL APPLICATIONS, VLM4BIO 2024, 2024, : 25 - 30
  • [9] Antenna Design Using a GAN-Based Synthetic Data Generation Approach
    Noakoasteen, Oameed
    Vijayamohanan, Jayakrishnan
    Gupta, Arjun
    Christodoulou, Christos
    IEEE OPEN JOURNAL OF ANTENNAS AND PROPAGATION, 2022, 3 : 488 - 494
  • [10] Towards the Next Generation of Data-Driven Therapeutics Using Spatially Resolved Single-Cell Technologies and Generative AI
    Rodov, Avital
    Baniadam, Hosna
    Zeiser, Robert
    Amit, Ido
    Yosef, Nir
    Wertheimer, Tobias
    Ingelfinger, Florian
    EUROPEAN JOURNAL OF IMMUNOLOGY, 2025, 55 (02)