Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks

被引:0
|
作者
Ramachandranpillai, Resmi [1 ]
Sikder, Md Fahim [1 ]
Bergstrom, David [1 ]
Heintz, Fredrik [1 ]
机构
[1] Linkoping Univ, Dept Comp & Informat Sci IDA, Linkoping, Sweden
关键词
Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However; the existing literature primarily focuses on the quality of synthetic health data; neglecting the crucial aspect of fairness in downstream predictions. Consequently; models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns; we present Bias-transforming Generative Adversarial Networks (Bt-GAN); a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i); we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii); we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method; we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore; we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion; our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs; we pave the way for more reliable and unbiased predictions in healthcare applications. © 2024 The Authors;
D O I
10.1613/jair.1.15317
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However, the existing literature primarily focuses on the quality of synthetic health data, neglecting the crucial aspect of fairness in downstream predictions. Consequently, models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns, we present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i), we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii), we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method, we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore, we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion, our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs, we pave the way for more reliable and unbiased predictions in healthcare applications.
引用
收藏
页码:1313 / 1341
页数:29
相关论文
共 50 条
  • [31] MEF-GAN: Multi-Exposure Image Fusion via Generative Adversarial Networks
    Xu, Han
    Ma, Jiayi
    Zhang, Xiao-Ping
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 7203 - 7216
  • [32] Fin-GAN: forecasting and classifying financial time series via generative adversarial networks
    Vuletic, Milena
    Prenzel, Felix
    Cucuringu, Mihai
    QUANTITATIVE FINANCE, 2024, 24 (02) : 175 - 199
  • [33] Generating synthetic as-built additive manufacturing surface topography using progressive growing generative adversarial networks
    Seo, Junhyeon
    Rao, Prahalada
    Raeymaekers, Bart
    FRICTION, 2024, 12 (05) : 968 - 980
  • [34] Generating synthetic as-built additive manufacturing surface topography using progressive growing generative adversarial networks
    Seo, Junhyeon
    Rao, Prahalada
    Raeymaekers, Bart
    Friction, 2024, 12 (06) : 1283 - 1298
  • [35] Generating Realistic Synthetic Traffic Data using Conditional Tabular Generative Adversarial Networks for Intelligent Transportation Systems
    Nigam, Archana
    Srivastava, Sanjay
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 2881 - 2886
  • [36] Generating synthetic as-built additive manufacturing surface topography using progressive growing generative adversarial networks
    Junhyeon Seo
    Prahalada Rao
    Bart Raeymaekers
    Friction, 2024, 12 : 1283 - 1298
  • [37] Curtaining artifacts generation on synthetic FIB-SEM data via Generative Adversarial Networks
    Roldan, Diego
    Barbosa-Torres, Luis
    OPTICS COMMUNICATIONS, 2025, 574
  • [38] Boosting EEG and ECG Classification with Synthetic Biophysical Data Generated via Generative Adversarial Networks
    Venugopal, Archana
    Resende Faria, Diego
    Applied Sciences (Switzerland), 2024, 14 (23):
  • [39] Performance Comparison between Generative Adversarial Networks (GAN) Variants in Generating Anime/Comic Character Images - A Preliminary Result
    Noor, Nur Qamarina Mohd
    Zabidi, Azlee
    Jaya, Mohd Izham Bin Mohd
    Ler, Tan Jia
    2024 IEEE SYMPOSIUM ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, ISIEA 2024, 2024,
  • [40] IE-GAN: a data-driven crowd simulation method via generative adversarial networks
    Xuanqi Lin
    Yuchen Liang
    Yong Zhang
    Yongli Hu
    Baocai Yin
    Multimedia Tools and Applications, 2024, 83 : 45207 - 45240