Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks

被引:0
|
作者
Ramachandranpillai, Resmi [1 ]
Sikder, Md Fahim [1 ]
Bergstrom, David [1 ]
Heintz, Fredrik [1 ]
机构
[1] Linkoping Univ, Dept Comp & Informat Sci IDA, Linkoping, Sweden
关键词
Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However; the existing literature primarily focuses on the quality of synthetic health data; neglecting the crucial aspect of fairness in downstream predictions. Consequently; models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns; we present Bias-transforming Generative Adversarial Networks (Bt-GAN); a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i); we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii); we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method; we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore; we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion; our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs; we pave the way for more reliable and unbiased predictions in healthcare applications. © 2024 The Authors;
D O I
10.1613/jair.1.15317
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However, the existing literature primarily focuses on the quality of synthetic health data, neglecting the crucial aspect of fairness in downstream predictions. Consequently, models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns, we present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i), we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii), we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method, we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore, we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion, our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs, we pave the way for more reliable and unbiased predictions in healthcare applications.
引用
收藏
页码:1313 / 1341
页数:29
相关论文
共 50 条
  • [41] MetaEx-GAN: Meta Exploration to Improve Natural Language Generation via Generative Adversarial Networks
    Chuang, Yun-Yen
    Hsu, Hung-Min
    Lin, Kevin
    Chang, Ray-I.
    Lee, Hung-Yi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3968 - 3980
  • [42] IE-GAN: a data-driven crowd simulation method via generative adversarial networks
    Lin, Xuanqi
    Liang, Yuchen
    Zhang, Yong
    Hu, Yongli
    Yin, Baocai
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 45207 - 45240
  • [43] Generating Synthetic Fermentation Data of Shindari, a Traditional Jeju Beverage, Using Multiple Imputation Ensemble and Generative Adversarial Networks
    Hazra, Debapriya
    Byun, Yung-Cheol
    APPLIED SCIENCES-BASEL, 2021, 11 (06):
  • [44] Synthetic Genitourinary Image Synthesis via Generative Adversarial Networks: Enhancing Artificial Intelligence Diagnostic Precision
    Van Booven, Derek J.
    Chen, Cheng-Bang
    Malpani, Sheetal
    Mirzabeigi, Yasamin
    Mohammadi, Maral
    Wang, Yujie
    Kryvenko, Oleksander N.
    Punnen, Sanoj
    Arora, Himanshu
    JOURNAL OF PERSONALIZED MEDICINE, 2024, 14 (07):
  • [45] Advanced R-GAN: Generating anomaly data for improved detection in imbalanced datasets using regularized generative adversarial networks
    Lee, Junhak
    Jung, Dayeon
    Moon, Jihoon
    Rho, Seungmin
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 111 : 491 - 510
  • [46] TF-GAN: Satellite Anomaly Detection via Generative Adversarial Networks and Time-Frequency Spectrum
    Jiao, Jian
    Li, Gang
    Wang, Jianwen
    Zhao, Zhichun
    Li, Jun
    Chen, Hongmeng
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (06) : 9143 - 9153
  • [47] TRANSFER-GAN: MULTIMODAL CT IMAGE SUPER-RESOLUTION VIA TRANSFER GENERATIVE ADVERSARIAL NETWORKS
    Xiao, Yao
    Peters, Keith R.
    Fox, W. Christopher
    Rees, John H.
    Rajderkar, Dhanashree A.
    Arreola, Manuel M.
    Barreto, Izabella
    Bolch, Wesley E.
    Fang, Ruogu
    2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 195 - 198
  • [48] Generating Synthetic Images of Gamma-Ray Events for Imaging Atmospheric Cherenkov Telescopes Using Conditional Generative Adversarial Networks
    Dubenskaya, Yu. Yu.
    Kryukov, A. P.
    Demichev, A. P.
    Polyakov, S. P.
    Zhurov, D. P.
    Gres, E. O.
    Vlaskina, A. A.
    MOSCOW UNIVERSITY PHYSICS BULLETIN, 2023, 78 (SUPPL 1) : S64 - S70
  • [49] Generating synthetic clinical data that capture class imbalanced distributions with generative adversarial networks: Example using antiretroviral therapy for HIV
    Kuo, Nicholas I-Hsien
    Garcia, Federico
    Soennerborg, Anders
    Boehm, Michael
    Kaiser, Rolf
    Zazzi, Maurizio
    Polizzotto, Mark
    Jorm, Louisa
    Barbieri, Sebastiano
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 144
  • [50] Generating Synthetic Images of Gamma-Ray Events for Imaging Atmospheric Cherenkov Telescopes Using Conditional Generative Adversarial Networks
    Yu. Yu. Dubenskaya
    A. P. Kryukov
    A. P. Demichev
    S. P. Polyakov
    D. P. Zhurov
    E. O. Gres
    A. A. Vlaskina
    Moscow University Physics Bulletin, 2023, 78 : S64 - S70