Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records

被引:4
|
作者
Berge, Geir Thore [1 ,2 ]
Granmo, Ole-Christoffer [3 ]
Tveit, Tor Oddbjorn [2 ,4 ]
Ruthjersen, Anna Linda [2 ]
Sharma, Jivitesh [2 ,3 ]
机构
[1] Univ Agder, Dept Informat Syst, Kristiansand, Norway
[2] Sorlandet Hosp Trust, Dept Technol & eHlth, Kristiansand, Norway
[3] Univ Agder, Dept ICT, Grimstad, Norway
[4] Sorlandet Hosp Trust, Dept Anesthesia & Intens Care, Kristiansand, Norway
关键词
Natural language processing; Electronic health records; Machine learning; Supervised; Unsupervised; Rule-based; Classification; Clinical decisions support systems; DECISION-SUPPORT-SYSTEMS; EXTRACTION SYSTEM; BIG-DATA; IDENTIFICATION; ASSERTIONS; KNOWLEDGE; DISEASES; EVENTS; MODELS;
D O I
10.1186/s12911-023-02271-8
中图分类号
R-058 [];
学科分类号
摘要
BackgroundData mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency.MethodsIn this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification.ResultsIn empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method's performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks.ConclusionsBased on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] Combining Rule-Based System and Machine Learning to Classify Semi-natural Language Data
    Hussain, Zafar
    Nurminen, Jukka K.
    Mikkonen, Tommi
    Kowiel, Marcin
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2023, 542 : 424 - 441
  • [32] Combining a rule-based expert system and machine learning in a simulated mobile robot control system
    Foster, K
    Hendtlass, T
    DESIGN AND APPLICATION OF HYBRID INTELLIGENT SYSTEMS, 2003, 104 : 361 - 370
  • [33] Rule-Based System with Machine Learning Support for Detecting Anomalies in 5G WLANs
    Uszko, Krzysztof
    Kasprzyk, Maciej
    Natkaniec, Marek
    Cholda, Piotr
    ELECTRONICS, 2023, 12 (11)
  • [34] Electronic health records based reinforcement learning for treatment optimizing
    Li, Tianhao
    Wang, Zhishun
    Lu, Wei
    Zhang, Qian
    Li, Dengfeng
    INFORMATION SYSTEMS, 2022, 104
  • [35] Combining a Rule-based Classifier with Ensemble of Feature Sets and Machine Learning Techniques for Sentiment Analysis on Microblog
    Siddiqua, Umme Aymun
    Ahsan, Tanveer
    Chy, Abu Nowshed
    PROCEEDINGS OF THE 2016 19TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2016, : 304 - 309
  • [36] Semi-supervised Double Deep Learning Temporal Risk Prediction (SeDDLeR) with Electronic Health Records
    Nogues, Isabelle-Emmanuella
    Wen, Jun
    Zhao, Yihan
    Bonzel, Clara-Lea
    Castro, Victor M.
    Lin, Yucong
    Xu, Shike
    Hou, Jue
    Cai, Tianxi
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 157
  • [37] A rule-based machine learning methodology for the proactive improvement of OEE: a real case study
    Lucantoni, Laura
    Antomarioni, Sara
    Ciarapica, Filippo Emanuele
    Bevilacqua, Maurizio
    INTERNATIONAL JOURNAL OF QUALITY & RELIABILITY MANAGEMENT, 2024, 41 (05) : 1356 - 1376
  • [38] Automatic De-Identification of French Clinical Records: Comparison of Rule-Based and Machine-Learning Approaches
    Grouin, Cyril
    Zweigenbaum, Pierre
    MEDINFO 2013: PROCEEDINGS OF THE 14TH WORLD CONGRESS ON MEDICAL AND HEALTH INFORMATICS, PTS 1 AND 2, 2013, 192 : 476 - 480
  • [39] Patient perspectives on acceptability of, and implementation preferences for, use of electronic health records and machine learning to identify suicide risk
    Yarborough, Bobbi Jo H.
    Stumbo, Scott P.
    GENERAL HOSPITAL PSYCHIATRY, 2021, 70 : 31 - 37
  • [40] Multi-perspective patient representation learning for disease prediction on electronic health records
    Yu, Ziyue
    Wang, Jiayi
    Luo, Wuman
    Tse, Rita
    Pau, Giovanni
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (12) : 7837 - 7858