Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries

被引：4

作者：

Roechner, Philipp ^{[1
]}

Rothlauf, Franz ^{[1
]}

机构：

[1] Johannes Gutenberg Univ Mainz, Informat Syst & Business Adm, Jakob Welder Weg 9, D-55128 Mainz, Germany

来源：

BMC MEDICAL RESEARCH METHODOLOGY | 2023年 / 23卷 / 01期

关键词：

Anomaly detection; Outlier detection; Data quality; Quality control; Electronic health records; Medical records; Cancer registration; Neural network; Machine learning; Artificial intelligence;

D O I：

10.1186/s12874-023-01946-0

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Background Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense. Methods Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a patternbased approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection-a total of 785 different records-are evaluated in a realworld scenario by medical domain experts. Results Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified 8% of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, 28% of the proposed 300 records in each sample were implausible. This corresponds to a precision of 28% for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was 22% and the sensitivity of FindFPOF was 26%. Both anomaly detection methods had a specificity of 94%. Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample. Conclusions Unsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample.

引用

页数：14

共 50 条

[1] Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries
Philipp Röchner
Franz Rothlauf
BMC Medical Research Methodology, 23
[2] The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
Paul Bergmann
Kilian Batzner
Michael Fauser
David Sattlegger
Carsten Steger
International Journal of Computer Vision, 2021, 129 : 1038 - 1059
[3] The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
Bergmann, Paul
Batzner, Kilian
Fauser, Michael
Sattlegger, David
Steger, Carsten
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (04) : 1038 - 1059
[4] Mining Electronic Health Records for Real-World Evidence
Zang, Chengxi
Pan, Weishen
Wang, Fei
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5837 - 5838
[5] Quantum Kernels for Real-World Predictions Based on Electronic Health Records
Krunic Z.
Flother F.
Seegan G.
Earnest-Noble N.
Omar S.
IEEE Transactions on Quantum Engineering, 2022, 3
[6] A Comprehensive Real-World Photometric Stereo Dataset for Unsupervised Anomaly Detection
Jung, Junyong
Han, Seungoh
Park, Jinsun
Cho, Donghyeon
IEEE ACCESS, 2022, 10 : 108914 - 108923
[7] Approach to machine learning for extraction of real-world data variables from electronic health records
Adamson, Blythe
Waskom, Michael
Blarre, Auriane
Kelly, Jonathan
Krismer, Konstantin
Nemeth, Sheila
Gippetti, James
Ritten, John
Harrison, Katherine
Ho, George
Linzmayer, Robin
Bansal, Tarun
Wilkinson, Samuel
Amster, Guy
Estola, Evan
Benedum, Corey M.
Fidyk, Erin
Estevez, Melissa
Shapiro, Will
Cohen, Aaron B.
FRONTIERS IN PHARMACOLOGY, 2023, 14
[8] A Post-Marketing Drug Evaluation Framework Based on Real-World Electronic Health Records Data
Wang, Yu
Ma, Shuang
Ru, Hua
Ni, Hongyi
Li, Jingsong
MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 134 - 138
[9] Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview
Al-Sahab, Ban
Leviton, Alan
Loddenkemper, Tobias
Paneth, Nigel
Zhang, Bo
JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2024, 8 (01) : 121 - 139
[10] Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview
Ban Al-Sahab
Alan Leviton
Tobias Loddenkemper
Nigel Paneth
Bo Zhang
Journal of Healthcare Informatics Research, 2024, 8 : 121 - 139

← 1 2 3 4 5 →