Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries

被引:4
|
作者
Roechner, Philipp [1 ]
Rothlauf, Franz [1 ]
机构
[1] Johannes Gutenberg Univ Mainz, Informat Syst & Business Adm, Jakob Welder Weg 9, D-55128 Mainz, Germany
关键词
Anomaly detection; Outlier detection; Data quality; Quality control; Electronic health records; Medical records; Cancer registration; Neural network; Machine learning; Artificial intelligence;
D O I
10.1186/s12874-023-01946-0
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense. Methods Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a patternbased approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection-a total of 785 different records-are evaluated in a realworld scenario by medical domain experts. Results Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified 8% of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, 28% of the proposed 300 records in each sample were implausible. This corresponds to a precision of 28% for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was 22% and the sensitivity of FindFPOF was 26%. Both anomaly detection methods had a specificity of 94%. Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample. Conclusions Unsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] The Oddity Detection in Diverse Scenes (ODDS) database: Validated real-world scenes for studying anomaly detection
    Hout, Michael C.
    Papesh, Megan H.
    Masadeh, Saleem
    Sandin, Hailey
    Walenchok, Stephen C.
    Post, Phillip
    Madrid, Jessica
    White, Bryan
    Pinto, Juan D. Guevara
    Welsh, Julian
    Goode, Dre
    Skulsky, Rebecca
    Rodriguez, Mariana Cazares
    BEHAVIOR RESEARCH METHODS, 2023, 55 (02) : 583 - 599
  • [42] Identification of delirium from real-world electronic health record clinical notes
    St Sauver, Jennifer
    Fu, Sunyang
    Sohn, Sunghwan
    Weston, Susan
    Fan, Chun
    Olson, Janet
    Thorsteinsdottir, Bjoerg
    Lebrasseur, Nathan
    Pagali, Sandeep
    Rocca, Walter
    Liu, Hongfang
    JOURNAL OF CLINICAL AND TRANSLATIONAL SCIENCE, 2023, 7 (01)
  • [43] Independent real-world application of a clinical-grade automated prostate cancer detection system
    da Silva, Leonard M.
    Pereira, Emilio M.
    Salles, Paulo G. O.
    Godrich, Ran
    Ceballos, Rodrigo
    Kunz, Jeremy D.
    Casson, Adam
    Viret, Julian
    Chandarlapaty, Sarat
    Ferreira, Carlos Gil
    Ferrari, Bruno
    Rothrock, Brandon
    Raciti, Patricia
    Reuter, Victor
    Dogdas, Belma
    DeMuth, George
    Sue, Jillian
    Kanan, Christopher
    Grady, Leo
    Fuchs, Thomas J.
    Reis-Filho, Jorge S.
    JOURNAL OF PATHOLOGY, 2021, 254 (02) : 147 - 158
  • [44] Evaluation of US oncology electronic health record real-world data to reduce uncertainty in health technology appraisals: a retrospective cohort study
    Mpofu, Philani
    Kent, Seamus
    Jonsson, Pall
    Pittell, Harlan
    Groves, Brad
    Altomare, Ivy
    Copeland, Amanda
    Baxi, Shrujal
    Bargo, Danielle
    Sujenthiran, Arun
    Adamson, Blythe
    BMJ OPEN, 2023, 13 (10):
  • [45] Real-world electronic health record identifies antimalarial underprescribing in patients with lupus nephritis
    Xiong, W. W.
    Boone, J. B.
    Wheless, L.
    Chung, C. P.
    Crofford, L. J.
    Barnado, A.
    LUPUS, 2019, 28 (08) : 977 - 985
  • [46] Integrated electronic health record tools to access real-world data in oncology research
    Casagni, Michelle
    Llewellyn, Nicole
    Kokolus, Maeve
    Chan, Miranda
    Dingwell, Robert
    Chow, Selina
    Campbell, Nancy
    Elrahi, Cassandra
    Piantadosi, Steven
    Quina, Andre
    JAMIA OPEN, 2024, 7 (04)
  • [47] Antihypertensive effects of yoga in a general patient population: real-world evidence from electronic health records, a retrospective case-control study
    Nadia M. Penrod
    Jason H. Moore
    BMC Public Health, 22
  • [48] Validity of Chronic Venous Disease Diagnoses and Epidemiology Using Validated Electronic Health Records From Primary Care: A Real-World Data Analysis
    Homs-Romero, Erica
    Romero-Collado, Angel
    Verdu, Jose
    Blanch, Jordi
    Rascon-Hernan, Carolina
    Marti-Lluch, Ruth
    JOURNAL OF NURSING SCHOLARSHIP, 2021, 53 (03) : 296 - 305
  • [49] Antihypertensive effects of yoga in a general patient population: real-world evidence from electronic health records, a retrospective case-control study
    Penrod, Nadia M.
    Moore, Jason H.
    BMC PUBLIC HEALTH, 2022, 22 (01)
  • [50] Design of a Fuzzy Logic Based Framework for Comprehensive Anomaly Detection in Real-World Energy Consumption Data
    Hol, Muriel
    Bilgin, Aysenur
    BNAIC 2016: ARTIFICIAL INTELLIGENCE, 2017, 765 : 121 - 136