Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records

被引：4

作者：

Berge, Geir Thore ^{[1
,2
]}

Granmo, Ole-Christoffer ^{[3
]}

Tveit, Tor Oddbjorn ^{[2
,4
]}

Ruthjersen, Anna Linda ^{[2
]}

Sharma, Jivitesh ^{[2
,3
]}

机构：

[1] Univ Agder, Dept Informat Syst, Kristiansand, Norway

[2] Sorlandet Hosp Trust, Dept Technol & eHlth, Kristiansand, Norway

[3] Univ Agder, Dept ICT, Grimstad, Norway

[4] Sorlandet Hosp Trust, Dept Anesthesia & Intens Care, Kristiansand, Norway

来源：

BMC MEDICAL INFORMATICS AND DECISION MAKING | 2023年 / 23卷 / 01期

关键词：

Natural language processing; Electronic health records; Machine learning; Supervised; Unsupervised; Rule-based; Classification; Clinical decisions support systems; DECISION-SUPPORT-SYSTEMS; EXTRACTION SYSTEM; BIG-DATA; IDENTIFICATION; ASSERTIONS; KNOWLEDGE; DISEASES; EVENTS; MODELS;

D O I：

10.1186/s12911-023-02271-8

中图分类号：

R-058 [];

学科分类号：

摘要：

BackgroundData mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency.MethodsIn this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification.ResultsIn empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method's performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks.ConclusionsBased on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP.

引用

页数：25

共 50 条

[21] Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
Wang, Ni
Huang, Yanqun
Liu, Honglei
Zhang, Zhiqiang
Wei, Lan
Fei, Xiaolu
Chen, Hui
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (SUPPL 2)
[22] An unsupervised learning approach to identify immunoglobulin utilization patterns using electronic health records
Riazi, Kiarash
Ly, Mark
Barty, Rebecca
Callum, Jeannie
Arnold, Donald M.
Heddle, Nancy M.
Down, Douglas G.
Sidhu, Davinder
Li, Na
TRANSFUSION, 2023, 63 (12) : 2234 - 2247
[23] Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models
Banda, Juan M.
Seneviratne, Martin
Hernandez-Boussard, Tina
Shah, Nigam H.
ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, VOL 1, 2018, 1 : 53 - 68
[24] A Deep Learning-Based Unsupervised Method to Impute Missing Values in Patient Records for Improved Management of Cardiovascular Patients
Xu, Da
Sheng, Jessica Qiuhua
Hu, Paul Jen-Hwa
Huang, Ting-Shuo
Hsu, Chih-Chin
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (06) : 2260 - 2272
[25] A Rule-Based Data Quality Assessment System for Electronic Health Record Data
Wang, Zhan
Talburt, John R.
Wu, Ningning
Dagtas, Serhan
Zozus, Meredith Nahm
APPLIED CLINICAL INFORMATICS, 2020, 11 (04): : 622 - 634
[26] Guiding urban self-organization: Combining rule-based and case-based planning
Partanen, J.
ENVIRONMENT AND PLANNING B-URBAN ANALYTICS AND CITY SCIENCE, 2020, 47 (02) : 304 - 320
[27] SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records
Zang, Chengxi
Wang, Fei
2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 857 - 866
[28] Using Rule-Based Labels for Weak Supervised Learning A ChemNet for Transferable Chemical Property Prediction
Goh, Garrett B.
Siegel, Charles
Vishnu, Abhinav
Hodas, Nathan
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 302 - 310
[29] Patient-oriented unsupervised learning to uncover the patterns of multimorbidity associated with stroke using primary care electronic health records
Delord, Marc
Sun, Xiaohui
Learoyd, Annastazia
Curcin, Vasa
Wolfe, Charles
Ashworth, Mark
Douiri, Abdel
BMC PRIMARY CARE, 2024, 25 (01):
[30] Detecting critical diseases associated with higher mortality in electronic health records using a hybrid attention-based transformer
Kodati, Dheeraj
Dasari, Chandra Mohan
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139

← 1 2 3 4 5 →