Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks

被引：0

作者：

Ramachandranpillai, Resmi ^{[1
]}

Sikder, Md Fahim ^{[1
]}

Bergstrom, David ^{[1
]}

Heintz, Fredrik ^{[1
]}

机构：

[1] Linkoping Univ, Dept Comp & Informat Sci IDA, Linkoping, Sweden

来源：

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2024年 / 79卷

关键词：

Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However; the existing literature primarily focuses on the quality of synthetic health data; neglecting the crucial aspect of fairness in downstream predictions. Consequently; models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns; we present Bias-transforming Generative Adversarial Networks (Bt-GAN); a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i); we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii); we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method; we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore; we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion; our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs; we pave the way for more reliable and unbiased predictions in healthcare applications. © 2024 The Authors;

D O I：

10.1613/jair.1.15317

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However, the existing literature primarily focuses on the quality of synthetic health data, neglecting the crucial aspect of fairness in downstream predictions. Consequently, models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns, we present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i), we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii), we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method, we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore, we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion, our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs, we pave the way for more reliable and unbiased predictions in healthcare applications.

引用

页码：1313 / 1341

页数：29

共 50 条

[31] MEF-GAN: Multi-Exposure Image Fusion via Generative Adversarial Networks
Xu, Han
Ma, Jiayi
Zhang, Xiao-Ping
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 7203 - 7216
[32] Fin-GAN: forecasting and classifying financial time series via generative adversarial networks
Vuletic, Milena
Prenzel, Felix
Cucuringu, Mihai
QUANTITATIVE FINANCE, 2024, 24 (02) : 175 - 199
[33] Generating synthetic as-built additive manufacturing surface topography using progressive growing generative adversarial networks
Seo, Junhyeon
Rao, Prahalada
Raeymaekers, Bart
FRICTION, 2024, 12 (05) : 968 - 980
[34] Generating synthetic as-built additive manufacturing surface topography using progressive growing generative adversarial networks
Seo, Junhyeon
Rao, Prahalada
Raeymaekers, Bart
Friction, 2024, 12 (06) : 1283 - 1298
[35] Generating Realistic Synthetic Traffic Data using Conditional Tabular Generative Adversarial Networks for Intelligent Transportation Systems
Nigam, Archana
Srivastava, Sanjay
2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 2881 - 2886
[36] Generating synthetic as-built additive manufacturing surface topography using progressive growing generative adversarial networks
Junhyeon Seo
Prahalada Rao
Bart Raeymaekers
Friction, 2024, 12 : 1283 - 1298
[37] Curtaining artifacts generation on synthetic FIB-SEM data via Generative Adversarial Networks
Roldan, Diego
Barbosa-Torres, Luis
OPTICS COMMUNICATIONS, 2025, 574
[38] Boosting EEG and ECG Classification with Synthetic Biophysical Data Generated via Generative Adversarial Networks
Venugopal, Archana
Resende Faria, Diego
Applied Sciences (Switzerland), 2024, 14 (23):
[39] Performance Comparison between Generative Adversarial Networks (GAN) Variants in Generating Anime/Comic Character Images - A Preliminary Result
Noor, Nur Qamarina Mohd
Zabidi, Azlee
Jaya, Mohd Izham Bin Mohd
Ler, Tan Jia
2024 IEEE SYMPOSIUM ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, ISIEA 2024, 2024,
[40] IE-GAN: a data-driven crowd simulation method via generative adversarial networks
Xuanqi Lin
Yuchen Liang
Yong Zhang
Yongli Hu
Baocai Yin
Multimedia Tools and Applications, 2024, 83 : 45207 - 45240

← 1 2 3 4 5 →