Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries

被引:4
|
作者
Roechner, Philipp [1 ]
Rothlauf, Franz [1 ]
机构
[1] Johannes Gutenberg Univ Mainz, Informat Syst & Business Adm, Jakob Welder Weg 9, D-55128 Mainz, Germany
关键词
Anomaly detection; Outlier detection; Data quality; Quality control; Electronic health records; Medical records; Cancer registration; Neural network; Machine learning; Artificial intelligence;
D O I
10.1186/s12874-023-01946-0
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense. Methods Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a patternbased approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection-a total of 785 different records-are evaluated in a realworld scenario by medical domain experts. Results Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified 8% of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, 28% of the proposed 300 records in each sample were implausible. This corresponds to a precision of 28% for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was 22% and the sensitivity of FindFPOF was 26%. Both anomaly detection methods had a specificity of 94%. Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample. Conclusions Unsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort
    Rochner, Philipp
    Rothlauf, Franz
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2024, 185
  • [22] Weight changes before and after lurasidone treatment: a real-world analysis using electronic health records
    Meyer, Jonathan M.
    Ng-Mak, Daisy S.
    Chuang, Chien-Chia
    Rajagopalan, Krithika
    Loebel, Antony
    ANNALS OF GENERAL PSYCHIATRY, 2017, 16
  • [23] Maritime Anomaly Detection in a Real-World Scenario: Ever Given Grounding in the Suez Canal
    Forti, Nicola
    d'Afflisio, Enrica
    Braca, Paolo
    Millefiori, Leonardo M.
    Willett, Peter
    Carniel, Sandro
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (08) : 13904 - 13910
  • [24] Early prediction of Alzheimer's disease and related dementias using real-world electronic health records
    Li, Qian
    Yang, Xi
    Xu, Jie
    Guo, Yi
    He, Xing
    Hu, Hui
    Lyu, Tianchen
    Marra, David
    Miller, Amber
    Smith, Glenn
    DeKosky, Steven
    Boyce, Richard D.
    Schliep, Karen
    Shenkman, Elizabeth
    Maraganore, Demetrius
    Wu, Yonghui
    Bian, Jiang
    ALZHEIMERS & DEMENTIA, 2023, 19 (08) : 3506 - 3518
  • [25] Weight changes before and after lurasidone treatment: a real-world analysis using electronic health records
    Jonathan M. Meyer
    Daisy S. Ng-Mak
    Chien-Chia Chuang
    Krithika Rajagopalan
    Antony Loebel
    Annals of General Psychiatry, 16
  • [26] Composite score for anomaly detection in imbalanced real-world industrial dataset
    Bougaham, Arnaud
    El Adoui, Mohammed
    Linden, Isabelle
    Frenay, Benoit
    MACHINE LEARNING, 2024, 113 (07) : 4381 - 4406
  • [27] A Real-Time Deep Learning Approach for Real-World Video Anomaly Detection
    Petrocchi, Stefano
    Giorgi, Giacomo
    Cimino, Mario G. C. A.
    ARES 2021: 16TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, 2021,
  • [28] Assessing Real-World Data From Electronic Health Records for Health Technology Assessment: The SUITABILITY Checklist: A Good Practices Report of an ISPOR Task Force
    Fleurence, Rachael L.
    Kent, Seamus
    Adamson, Blythe
    Tcheng, James
    Balicer, Ran
    Ross, Joseph S.
    Haynes, Kevin
    Muller, Patrick
    Campbell, Jon
    Bouee-Benhamiche, Elsa
    Marti, Sebastian Garcia
    Ramsey, Scott
    VALUE IN HEALTH, 2024, 27 (06) : 692 - 701
  • [29] Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
    Benedum, Corey M.
    Sondhi, Arjun
    Fidyk, Erin
    Cohen, Aaron B.
    Nemeth, Sheila
    Adamson, Blythe
    Estevez, Melissa
    Bozkurt, Selen
    CANCERS, 2023, 15 (06)
  • [30] PeanutAD: A Real-World Dataset for Anomaly Detection in Agricultural Product Processing Line
    Nguyen, Duc-Hai
    Do, Trong-Hiep
    Nguyen, Quoc-Khanh
    Nguyen, Hoang-Linh-Phuong
    Nguyen, Thi-Huong
    Tran, Duc-Tan
    Nguyen, Van-Toi
    2024 IEEE TENTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, ICCE 2024, 2024, : 427 - 432