Identifying stroke-related quantified evidence from electronic health records in real-world studies

被引:7
作者
Yang, Lin [1 ,2 ]
Huang, Xiaoshuo [1 ,3 ]
Wang, Jiayang [1 ]
Yang, Xin [4 ,5 ]
Ding, Lingling [4 ]
Li, Zixiao [4 ,5 ,6 ]
Li, Jiao [1 ,2 ]
机构
[1] Chinese Acad Med Sci, Peking Union Med Coll, Inst Med Informat & Lib, 3 Yabao Rd, Beijing 100020, Peoples R China
[2] Chinese Acad Med Sci, Key Lab Med Informat Intelligent Technol, Beijing 100020, Peoples R China
[3] Dalian Neusoft Univ Informat, Sch Hlth Care Technol, Dalian 116023, Peoples R China
[4] Capital Med Univ, Beijing Tiantan Hosp, China Natl Clin Res Ctr Neurol Dis, Beijing 100070, Peoples R China
[5] Capital Med Univ, Beijing Tiantan Hosp, Natl Ctr Healthcare Qual Management Neurol Dis, Beijing 100070, Peoples R China
[6] Capital Med Univ, Beijing Tiantan Hosp, Dept Neurol, Beijing 100070, Peoples R China
关键词
Stroke; NIHSS; Information extraction; Reuse of electronic health records; Real -world study; ACUTE ISCHEMIC-STROKE; INFORMATION EXTRACTION; CARE PROFESSIONALS; EARLY MANAGEMENT; RETROSPECTIVE ASSESSMENT; RISK-FACTORS; GUIDELINES; ENTITY; EPIDEMIOLOGY; SEVERITY;
D O I
10.1016/j.artmed.2023.102552
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Background: Stroke is one of the leading causes of death and disability worldwide. The National Institutes of Health Stroke Scale (NIHSS) scores in electronic health records (EHRs), which quantitatively describe patients' neurological deficits in evidence-based treatment, are crucial in stroke-related clinical investigations. However, the free-text format and lack of standardization inhibit their effective use. Automatically extracting the scale scores from the clinical free text so that its potential value in real-world studies is realized has become an important goal.Objective: This study aims to develop an automated method to extract scale scores from the free text of EHRs. Methods: We propose a two-step pipeline method to identify NIHSS items and numerical scores and validate its feasibility using a freely accessible critical care database: MIMIC-III (Medical Information Mart for Intensive Care III). First, we utilize MIMIC-III to create an annotated corpus. Then, we investigate possible machine learning methods for two subtasks, NIHSS item and score recognition and item-score relation extraction. In the evaluation, we conduct both task-specific and end-to-end evaluations and compare our method with the rule-based method using precision, recall and F1 scores as evaluation metrics.Results: We use all available discharge summaries of stroke cases in MIMIC-III. The annotated NIHSS corpus contains 312 cases, 2929 scale items, 2774 scores and 2733 relations. The results show that the best F1-score of our method was 0.9006, which was attained by combining BERT-BiLSTM-CRF and Random Forest, and it outperformed the rule-based method (F1-score = 0.8098). In the end-to-end task, our method could successfully recognize the item "1b level of consciousness questions", the score "1" and their relation "('1b level of consciousness questions', '1', 'has value')" from the sentence "1b level of consciousness questions: said name = 1", while the rule-based method could not.Conclusions: The two-step pipeline method we propose is an effective approach to identify NIHSS items, scores and their relations. With its help, clinical investigators can easily retrieve and access structured scale data, thereby supporting stroke-related real-world studies.
引用
收藏
页数:13
相关论文
共 92 条
[1]  
Abzhandadze T, 2020, SCI REP-UK, V10, DOI 10.1038/s41598-019-57316-8
[2]   Multiple features for clinical relation extraction: A machine learning approach [J].
Alimova, Ilseyar ;
Tutubalina, Elena .
JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 103
[3]  
Alsentzer Emily, 2019, P 2 CLIN NATURAL LAN, DOI DOI 10.18653/V1/W19-1909
[4]  
[Anonymous], 2021, TRACK 2 N2C2 OHNLP T
[5]  
[Anonymous], 2018, Use of Electronic Health Record Data in Clinical Investigations Guidance for Industry. 83 FR 34137
[6]   Air quality warning system based on a localized PM2.5 soft sensor using a novel approach of Bayesian regularized neural network via forward feature selection [J].
Balram, Deepak ;
Lian, Kuang-Yow ;
Sebastian, Neethu .
ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY, 2019, 182
[7]   Real-World Outcomes of Acute Ischemic Stroke Treatment with Intravenous Recombinant Tissue Plasminogen Activator [J].
Betts, Keith A. ;
Hurley, Dana ;
Song, Jinlin ;
Sajeev, Gautam ;
Guo, Jenny ;
Du, Ella Xiaoyan ;
Paschoalin, Marco ;
Wu, Eric Q. .
JOURNAL OF STROKE & CEREBROVASCULAR DISEASES, 2017, 26 (09) :1996-2003
[8]   A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts [J].
Bose, Priyankar ;
Srinivasan, Sriram ;
Sleeman, William C. ;
Palta, Jatinder ;
Kapoor, Rishabh ;
Ghosh, Preetam .
APPLIED SCIENCES-BASEL, 2021, 11 (18)
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]  
Breiman L., 1984, Classification and regression trees, DOI DOI 10.1201/9781315139470