Challenges in clinical natural language processing for automated disorder normalization

被引：95

作者：

Leaman, Robert ^{[1
]}

Khare, Ritu ^{[1
]}

Lu, Zhiyong ^{[1
]}

机构：

[1] NIH, NCBI, NLM, Bethesda, MD 20894 USA

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2015年 / 57卷

关键词：

Natural language processing; Electronic health records; Information extraction; ELECTRONIC HEALTH RECORDS; TEXT; UMLS;

D O I：

10.1016/j.jbi.2015.07.010

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Background: Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or grounding) in clinical narratives than in biomedical publications. In this work, we aim to identify the cause for this performance difference and introduce general solutions. Methods: We use closure properties to compare the richness of the vocabulary in clinical narrative text to biomedical publications. We approach both disorder NER and normalization using machine learning methodologies. Our NER methodology is based on linear-chain conditional random fields with a rich feature approach, and we introduce several improvements to enhance the lexical knowledge of the NER system. Our normalization method - never previously applied to clinical data - uses pairwise learning to rank to automatically learn term variation directly from the training data. Results: We find that while the size of the overall vocabulary is similar between clinical narrative and biomedical publications, clinical narrative uses a richer terminology to describe disorders than publications. We apply our system, DNorm-C, to locate disorder mentions and in the clinical narratives from the recent ShARe/CLEF eHealth Task. For NER (strict span-only), our system achieves precision = 0.797, recall = 0.713, f-score = 0.753. For the normalization task (strict span + concept) it achieves precision = 0.712, recall = 0.637, f-score = 0.672. The improvements described in this article increase the NER f-score by 0.039 and the normalization f-score by 0.036. We also describe a high recall version of the NER, which increases the normalization recall to as high as 0.744, albeit with reduced precision. Discussion: We perform an error analysis, demonstrating that NER errors outnumber normalization errors by more than 4-to-1. Abbreviations and acronyms are found to be frequent causes of error, in addition to the mentions the annotators were not able to identify within the scope of the controlled vocabulary. Conclusion: Disorder mentions in text from clinical narratives use a rich vocabulary that results in high term variation, which we believe to be one of the primary causes of reduced performance in clinical narrative. We show that pairwise learning to rank offers high performance in this context, and introduce several lexical enhancements - generalizable to other clinical NER tasks - that improve the ability of the NER system to handle this variation. DNorm-C is a high performing, open source system for disorders in clinical text, and a promising step toward NER and normalization methods that are trainable to a wide variety of domains and entities. (DNorm-C is open source software, and is available with a trained model at the DNorm demonstration website: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/ #DNorm.) Published by Elsevier Inc.

引用

页码：28 / 37

页数：10

共 50 条

[21] Using natural language processing to provide personalized learning opportunities from trainee clinical notes
Denny, Joshua C.
Spickard, Anderson
Speltz, Peter J., III
Porier, Renee
Rosenstiel, Donna E.
Powers, James S.
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 56 : 292 - 299
[22] Identifying Symptom Information in Clinical Notes Using Natural Language Processing
Koleck, Theresa A.
Tatonetti, Nicholas P.
Bakken, Suzanne
Mitha, Shazia
Henderson, Morgan M.
George, Maureen
Miaskowski, Christine
Smaldone, Arlene
Topaz, Maxim
NURSING RESEARCH, 2021, 70 (03) : 173 - 183
[23] Natural language processing of clinical notes for identification of critical limb ischemia
Afzal, Naveed
Mallipeddi, Vishnu Priya
Sohn, Sunghwan
Liu, Hongfang
Chaudhry, Rajeev
Scott, Christopher G.
Kullo, Iftikhar J.
Arruda-Olson, Adelaide M.
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2018, 111 : 83 - 89
[24] Natural language processing: state of the art, current trends and challenges
Khurana, Diksha
Koli, Aditya
Khatter, Kiran
Singh, Sukhdev
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (03) : 3713 - 3744
[25] Identifying stigmatizing and positive/preferred language in obstetric clinical notes using natural language processing
Scroggins, Jihye Kim
Hulchafo, Ismael I.
Harkins, Sarah
Scharp, Danielle
Moen, Hans
Davoudi, Anahita
Cato, Kenrick
Tadiello, Michele
Topaz, Maxim
Barcelona, Veronica
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, : 308 - 317
[26] Clinical Natural Language Processing in languages other than English: opportunities and challenges
Aurélie Névéol
Hercules Dalianis
Sumithra Velupillai
Guergana Savova
Pierre Zweigenbaum
Journal of Biomedical Semantics, 9
[27] Natural language processing of German clinical colorectal cancer notes for guideline-based treatment evaluation
Becker, Matthias
Kasper, Stefan
Boeckmann, Britta
Joeckel, Karl-Heinz
Virchow, Isabel
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 127 : 141 - 146
[28] Applications of natural language processing in ophthalmology: present and future
Chen, Jimmy S.
Baxter, Sally L.
FRONTIERS IN MEDICINE, 2022, 9
[29] Natural Language Processing Methods to Extract Lifestyle Exposures for Alzheimer's Disease from Clinical Notes
Yi, Yoonkwon
Shen, Zitao
Anusha, Bompelli
Fang, Yu
Wang, Yanshan
Zhang, Rui
2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020), 2020, : 535 - 536
[30] Quantum Natural Language Processing: Challenges and Opportunities
Guarasci, Raffaele
De Pietro, Giuseppe
Esposito, Massimo
APPLIED SCIENCES-BASEL, 2022, 12 (11):

← 1 2 3 4 5 →