Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries

被引:4
|
作者
Roechner, Philipp [1 ]
Rothlauf, Franz [1 ]
机构
[1] Johannes Gutenberg Univ Mainz, Informat Syst & Business Adm, Jakob Welder Weg 9, D-55128 Mainz, Germany
关键词
Anomaly detection; Outlier detection; Data quality; Quality control; Electronic health records; Medical records; Cancer registration; Neural network; Machine learning; Artificial intelligence;
D O I
10.1186/s12874-023-01946-0
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense. Methods Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a patternbased approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection-a total of 785 different records-are evaluated in a realworld scenario by medical domain experts. Results Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified 8% of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, 28% of the proposed 300 records in each sample were implausible. This corresponds to a precision of 28% for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was 22% and the sensitivity of FindFPOF was 26%. Both anomaly detection methods had a specificity of 94%. Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample. Conclusions Unsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] From free-text electronic health records to structured cohorts: Onconum, an innovative methodology for real-world data mining in breast cancer
    Simoulin, Antoine
    Thiebaut, Nicolas
    Neuberger, Karl
    Ibnouhsein, Issam
    Brunel, Nicolas
    Vine, Raphael
    Bousquet, Nicolas
    Latapy, Jules
    Reix, Nathalie
    Moliere, Sebastien
    Lodi, Massimo
    Mathelin, Carole
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2023, 240
  • [32] A Comparative Study of Unsupervised Machine Learning Methods for Anomaly Detection in Flight Data: Case Studies from Real-World Flight Operations
    Jasra, Sameer Kumar
    Valentino, Gianluca
    Muscat, Alan
    Camilleri, Robert
    AEROSPACE, 2025, 12 (02)
  • [33] EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records
    Niu, Haoran
    Omitaomu, Olufemi A.
    Langston, Michael A.
    Olama, Mohammad
    Ozmen, Ozgur
    Klasky, Hilda B.
    Laurio, Angela
    Ward, Merry
    Nebeker, Jonathan
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 150
  • [34] Real-World Gait Bout Detection Using a Wrist Sensor: An Unsupervised Real-Life Validation
    Soltani, Abolfazl
    Paraschiv-Ionescu, Anisoara
    Dejnabadi, Hooman
    Marques-Vidal, Pedro
    Aminian, Kamiar
    IEEE ACCESS, 2020, 8 : 102883 - 102896
  • [35] An Efficient Key Frame Extraction from Surveillance Videos for Real-World Anomaly Detection
    Mangai, P.
    Geetha, M. Kalaiselvi
    Kumaravelan, G.
    THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND CAPSULE NETWORKS (ICIPCN 2022), 2022, 514 : 197 - 212
  • [36] Real-World Anomaly Detection by Using Digital Twin Systems and Weakly Supervised Learning
    Castellani, Andrea
    Schmitt, Sebastian
    Squartini, Stefano
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (07) : 4733 - 4742
  • [37] Using real-world electronic health record data to predict the development of 12 cancer-related symptoms in the context of multimorbidity
    Bandyopadhyay, Anindita
    Albashayreh, Alaa
    Zeinali, Nahid
    Fan, Weiguo
    Gilbertson-White, Stephanie
    JAMIA OPEN, 2024, 7 (03)
  • [38] Data-driven Semi-supervised Anomaly Detection using Real-World Call Data Record
    Jaffry, Shan
    Shah, Syed Tariq
    Hasan, Syed Faraz
    2020 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE WORKSHOPS (WCNCW), 2020,
  • [39] The Oddity Detection in Diverse Scenes (ODDS) database: Validated real-world scenes for studying anomaly detection
    Michael C. Hout
    Megan H. Papesh
    Saleem Masadeh
    Hailey Sandin
    Stephen C. Walenchok
    Phillip Post
    Jessica Madrid
    Bryan White
    Juan D. Guevara Pinto
    Julian Welsh
    Dre Goode
    Rebecca Skulsky
    Mariana Cazares Rodriguez
    Behavior Research Methods, 2023, 55 : 583 - 599
  • [40] Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies
    Hou, Jue
    Zhao, Rachel
    Gronsbell, Jessica
    Lin, Yucong
    Bonzel, Clara-Lea
    Zeng, Qingyi
    Zhang, Sinian
    Beaulieu-Jones, Brett K.
    Weber, Griffin M.
    Jemielita, Thomas
    Wan, Shuyan Sabrina
    Hong, Chuan
    Cai, Tianrun
    Wen, Jun
    Panickan, Vidul Ayakulangara
    Liaw, Kai-Li
    Liao, Katherine
    Cai, Tianxi
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25