Evaluation of synthetic electronic health records: A systematic review and experimental assessment

被引：1

作者：

Budu, Emmanuella ^{[1
]}

Etminani, Kobra ^{[1
]}

Soliman, Amira ^{[1
]}

Rognvaldsson, Thorsteinn ^{[1
]}

机构：

[1] Halmstad Univ, Ctr Appl Intelligent Syst Res CAISR, Kristian IV s vag 3, S-30118 Halmstad, Sweden

来源：

NEUROCOMPUTING | 2024年 / 603卷

关键词：

Synthetic data; Electronic health records (EHRs); Evaluation; GENERATION; FRAMEWORK; PRIVACY;

D O I：

10.1016/j.neucom.2024.128253

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent studies have shown how synthetic data generation methods can be applied to electronic health records (EHRs) to obtain synthetic versions that do not violate privacy rules. This growing body of research has resulted in the emergence of numerous methods for evaluating the quality of generated data, with new publications often introducing novel evaluation methods. This work presents a detailed review of synthetic EHRs, focusing on the various evaluation methods used to assess the quality of the generated EHRs. We discuss the existing evaluation methods, offering insights into their use as well as providing an interpretation of the evaluation metrics from the perspectives of achieving fidelity, , utility and privacy. . Furthermore, we highlight the key factors influencing the selection of evaluation methods, such as the type of data (e.g., categorical, continuous, or discrete) and the mode of application (e.g., patient level, cohort level, and feature level). To assess the effectiveness of current evaluation measures, we conduct a series of experiments to shed light on the potential limitations of these measures. The findings from these experiments reveal notable shortcomings, including the need for meticulous application of methods to the data to reduce inconsistent evaluations, the qualitative nature of some assessments subject to individual judgment, the need for clinical validations, and the absence of techniques to evaluate temporal dependencies within the data. This highlights the need to place greater emphasis on evaluation measures, their application, and the development of comprehensive evaluation frameworks as it is crucial for advancing progress in this field.

引用

页数：21

共 56 条

[51] A Multifaceted benchmarking of synthetic electronic health record generation models [J].

Yan, Chao ;

Yan, Yao ;

Wan, Zhiyu ;

Zhang, Ziqi ;

Omberg, Larsson ;

Guinney, Justin ;

Mooney, Sean D. ;

Malin, Bradley A. .

NATURE COMMUNICATIONS, 2022, 13 (01)

[52]

Yang F, 2019, IEEE INT C BIOINFORM, P906, DOI [10.1109/bibm47256.2019.8983215, 10.1109/BIBM47256.2019.8983215]

[53]

Yoo JS, 2018, RISKS PATIENT PRIVAC

[54]

Yoon J, 2019, ADV NEUR IN, V32

[55] SynTEG: a framework for temporal structured electronic health data simulation [J].

Zhang, Ziqi ;

Yan, Chao ;

Lasko, Thomas A. ;

Sun, Jimeng ;

Malin, Bradley A. .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (03) :596-604

[56] Ensuring electronic medical record simulation through better training, modeling, and evaluation [J].

Zhang, Ziqi ;

Yan, Chao ;

Mesa, Diego A. ;

Sun, Jimeng ;

Malin, Bradley A. .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (01) :99-108

← 1 2 3 4 5 6 →