Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records

被引:4
|
作者
Berge, Geir Thore [1 ,2 ]
Granmo, Ole-Christoffer [3 ]
Tveit, Tor Oddbjorn [2 ,4 ]
Ruthjersen, Anna Linda [2 ]
Sharma, Jivitesh [2 ,3 ]
机构
[1] Univ Agder, Dept Informat Syst, Kristiansand, Norway
[2] Sorlandet Hosp Trust, Dept Technol & eHlth, Kristiansand, Norway
[3] Univ Agder, Dept ICT, Grimstad, Norway
[4] Sorlandet Hosp Trust, Dept Anesthesia & Intens Care, Kristiansand, Norway
关键词
Natural language processing; Electronic health records; Machine learning; Supervised; Unsupervised; Rule-based; Classification; Clinical decisions support systems; DECISION-SUPPORT-SYSTEMS; EXTRACTION SYSTEM; BIG-DATA; IDENTIFICATION; ASSERTIONS; KNOWLEDGE; DISEASES; EVENTS; MODELS;
D O I
10.1186/s12911-023-02271-8
中图分类号
R-058 [];
学科分类号
摘要
BackgroundData mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency.MethodsIn this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification.ResultsIn empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method's performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks.ConclusionsBased on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
    Geir Thore Berge
    Ole-Christoffer Granmo
    Tor Oddbjørn Tveit
    Anna Linda Ruthjersen
    Jivitesh Sharma
    BMC Medical Informatics and Decision Making, 23
  • [2] Identifying Pneumonia Subtypes from Electronic Health Records Using Rule-Based Algorithms
    Hegde, Harshad
    Glurich, Ingrid
    Panny, Aloksagar
    Vedre, Jayanth G.
    VanWormer, Jeffrey J.
    Berg, Richard
    Scannapieco, Frank A.
    Miecznikowski, Jeffrey
    Acharya, Amit
    METHODS OF INFORMATION IN MEDICINE, 2022, 61 (01/02) : 29 - 37
  • [3] Identifying lupus patients in electronic health records: Development and validation of machine learning algorithms and application of rule-based algorithms
    Jorge, April
    Castro, Victor M.
    Barnado, April
    Gainer, Vivian
    Hong, Chuan
    Cai, Tianxi
    Cai, Tianrun
    Carroll, Robert
    Denny, Joshua C.
    Crofford, Leslie
    Costenbader, Karen H.
    Liao, Katherine P.
    Karlson, Elizabeth W.
    Feldman, Candace H.
    SEMINARS IN ARTHRITIS AND RHEUMATISM, 2019, 49 (01) : 84 - 90
  • [4] A deep learning-based, unsupervised method to impute missing values in electronic health records for improved patient management
    Xu, Da
    Hu, Paul Jen-Hwa
    Huang, Ting-Shuo
    Fang, Xiao
    Hsu, Chih-Chin
    JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 111
  • [5] Detecting cardiovascular diseases using unsupervised machine learning clustering based on electronic medical records
    Hu, Ying
    Yan, Hai
    Liu, Ming
    Gao, Jing
    Xie, Lianhong
    Zhang, Chunyu
    Wei, Lili
    Ding, Yinging
    Jiang, Hong
    BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [6] Rule-based and Machine Learning Hybrid System for Patient Cohort Selection
    Antunes, Rui
    Silva, Joao Figueira
    Pereira, Arnaldo
    Matos, Sergio
    HEALTHINF: PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 5: HEALTHINF, 2019, : 59 - 67
  • [7] Strategies to Address the Lack of Labeled Data for Supervised Machine Learning Training With Electronic Health Records: Case Study for the Extraction of Symptoms From
    Humbert-Droz, Marie
    Mukherjee, Pritam
    Gevaert, Olivier
    JMIR MEDICAL INFORMATICS, 2022, 10 (03)
  • [8] A deep learning approach for transgender and gender diverse patient identification in electronic health records
    Hua, Yining
    Wang, Liqin
    Nguyen, Vi
    Rieu-Werden, Meghan
    McDowell, Alex
    Bates, David W.
    Foer, Dinah
    Zhou, Li
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 147
  • [9] An automated tool for detecting medication overuse based on the electronic health records
    Salmasian, Hojjat
    Freedberg, Daniel E.
    Abrams, Julian A.
    Friedman, Carol
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2013, 22 (02) : 183 - 189
  • [10] Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records
    Wang, Yanshan
    Zhao, Yiqing
    Therneau, Terry M.
    Atkinson, Elizabeth J.
    Tafti, Ahmad P.
    Zhang, Nan
    Amin, Shreyasee
    Limper, Andrew H.
    Khosla, Sundeep
    Liu, Hongfang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 102