Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks

被引:0
|
作者
Ramachandranpillai, Resmi [1 ]
Sikder, Md Fahim [1 ]
Bergstrom, David [1 ]
Heintz, Fredrik [1 ]
机构
[1] Linkoping Univ, Dept Comp & Informat Sci IDA, Linkoping, Sweden
关键词
Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However; the existing literature primarily focuses on the quality of synthetic health data; neglecting the crucial aspect of fairness in downstream predictions. Consequently; models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns; we present Bias-transforming Generative Adversarial Networks (Bt-GAN); a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i); we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii); we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method; we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore; we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion; our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs; we pave the way for more reliable and unbiased predictions in healthcare applications. © 2024 The Authors;
D O I
10.1613/jair.1.15317
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However, the existing literature primarily focuses on the quality of synthetic health data, neglecting the crucial aspect of fairness in downstream predictions. Consequently, models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns, we present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i), we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii), we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method, we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore, we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion, our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs, we pave the way for more reliable and unbiased predictions in healthcare applications.
引用
收藏
页码:1313 / 1341
页数:29
相关论文
共 50 条
  • [1] On Generating Synthetic Histopathology Images Using Generative Adversarial Networks
    Carmody, Sean
    John, Deepu
    2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC, 2023,
  • [2] Generating Synthetic Vehicle Data Using Decentralized Generative Adversarial Networks
    Shaker, Basem
    Papini, Gastone Pietro Rosati
    Saveriano, Matteo
    Liang, Kuo-Yun
    IEEE ACCESS, 2024, 12 : 138076 - 138085
  • [3] Distance Correlation GAN: Fair Tabular Data Generation with Generative Adversarial Networks
    Rajabi, Amirarsalan
    Garibay, Ozlem Ozmen
    ARTIFICIAL INTELLIGENCE IN HCI, AI-HCI 2023, PT I, 2023, 14050 : 431 - 445
  • [4] Generating Sketch-Based Synthetic Seismic Images With Generative Adversarial Networks
    Ferreira, Rodrigo S.
    Noce, Julia
    Oliveira, Dario A. B.
    Brazil, Emilio Vital
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (08) : 1460 - 1464
  • [5] Generating synthetic CTs from magnetic resonance images using generative adversarial networks
    Emami, Hajar
    Dong, Ming
    Nejad-Davarani, Siamak P.
    Glide-Hurst, Carri K.
    MEDICAL PHYSICS, 2018, 45 (08) : 3627 - 3636
  • [6] Generating Synthetic Electronic Health Record Data Using Generative Adversarial Networks: Tutorial
    Yan, Chao
    Zhang, Ziqi
    Nyemba, Steve
    Li, Zhuohang
    JMIR AI, 2024, 3
  • [7] GENERATING SYNTHETIC IMAGES OF POLYPOID LESION IN SMALL BOWEL USING GENERATIVE ADVERSARIAL NETWORKS
    Atsawarungruangkit, Amporn
    Songsuittipong, Thanadon
    Numpacharoen, Kawee
    Charoenpong, Theekapun
    Wiwatwattana, Nuwee
    GASTROINTESTINAL ENDOSCOPY, 2021, 93 (06) : AB201 - AB202
  • [8] DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks
    van Breugel, Boris
    Kyono, Trent
    Berrevoets, Jeroen
    van der Schaar, Mihaela
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Generating Synthetic CTs From Magnetic Resonance Images Using Generative Adversarial Networks
    Gehari, H. Emami
    Nejad-Davarani, S. P.
    Dong, M.
    Glide-Hurst, C.
    MEDICAL PHYSICS, 2018, 45 (06) : E131 - E131
  • [10] SP-EyeGAN: Generating Synthetic Eye Movement Data with Generative Adversarial Networks
    Prasse, Paul
    Reich, David R.
    Makowski, Silvia
    Ahn, Seoyoung
    Scheffer, Tobias
    Jaeger, Lena A.
    ACM SYMPOSIUM ON EYE TRACKING RESEARCH & APPLICATIONS, ETRA 2023, 2023,