Challenges in clinical natural language processing for automated disorder normalization

被引:95
|
作者
Leaman, Robert [1 ]
Khare, Ritu [1 ]
Lu, Zhiyong [1 ]
机构
[1] NIH, NCBI, NLM, Bethesda, MD 20894 USA
关键词
Natural language processing; Electronic health records; Information extraction; ELECTRONIC HEALTH RECORDS; TEXT; UMLS;
D O I
10.1016/j.jbi.2015.07.010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or grounding) in clinical narratives than in biomedical publications. In this work, we aim to identify the cause for this performance difference and introduce general solutions. Methods: We use closure properties to compare the richness of the vocabulary in clinical narrative text to biomedical publications. We approach both disorder NER and normalization using machine learning methodologies. Our NER methodology is based on linear-chain conditional random fields with a rich feature approach, and we introduce several improvements to enhance the lexical knowledge of the NER system. Our normalization method - never previously applied to clinical data - uses pairwise learning to rank to automatically learn term variation directly from the training data. Results: We find that while the size of the overall vocabulary is similar between clinical narrative and biomedical publications, clinical narrative uses a richer terminology to describe disorders than publications. We apply our system, DNorm-C, to locate disorder mentions and in the clinical narratives from the recent ShARe/CLEF eHealth Task. For NER (strict span-only), our system achieves precision = 0.797, recall = 0.713, f-score = 0.753. For the normalization task (strict span + concept) it achieves precision = 0.712, recall = 0.637, f-score = 0.672. The improvements described in this article increase the NER f-score by 0.039 and the normalization f-score by 0.036. We also describe a high recall version of the NER, which increases the normalization recall to as high as 0.744, albeit with reduced precision. Discussion: We perform an error analysis, demonstrating that NER errors outnumber normalization errors by more than 4-to-1. Abbreviations and acronyms are found to be frequent causes of error, in addition to the mentions the annotators were not able to identify within the scope of the controlled vocabulary. Conclusion: Disorder mentions in text from clinical narratives use a rich vocabulary that results in high term variation, which we believe to be one of the primary causes of reduced performance in clinical narrative. We show that pairwise learning to rank offers high performance in this context, and introduce several lexical enhancements - generalizable to other clinical NER tasks - that improve the ability of the NER system to handle this variation. DNorm-C is a high performing, open source system for disorders in clinical text, and a promising step toward NER and normalization methods that are trainable to a wide variety of domains and entities. (DNorm-C is open source software, and is available with a trained model at the DNorm demonstration website: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/ #DNorm.) Published by Elsevier Inc.
引用
收藏
页码:28 / 37
页数:10
相关论文
共 50 条
  • [1] A Natural Language Processing Approach to Automated Highlighting of New Information in Clinical Notes
    Su, Yu-Hsiang
    Chao, Ching-Ping
    Hung, Ling-Chien
    Sung, Sheng-Feng
    Lee, Pei-Ju
    APPLIED SCIENCES-BASEL, 2020, 10 (08):
  • [2] Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings
    Carrell, David S.
    Schoen, Robert E.
    Leffler, Daniel A.
    Morris, Michele
    Rose, Sherri
    Baer, Andrew
    Crockett, Seth D.
    Gourevitch, Rebecca A.
    Dean, Katie M.
    Mehrotra, Ateev
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (05) : 986 - 991
  • [3] Natural language processing in clinical neuroscience and psychiatry: A review
    Crema, Claudio
    Attardi, Giuseppe
    Sartiano, Daniele
    Redolfi, Alberto
    FRONTIERS IN PSYCHIATRY, 2022, 13
  • [4] A scoping review of publicly available language tasks in clinical natural language processing
    Gao, Yanjun
    Dligach, Dmitriy
    Christensen, Leslie
    Tesch, Samuel
    Laffin, Ryan
    Xu, Dongfang
    Miller, Timothy
    Uzuner, Ozlem
    Churpek, Matthew M.
    Afshar, Majid
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2022, 29 (10) : 1797 - 1806
  • [5] Extraction of clinical phenotypes for Alzheimer's disease dementia from clinical notes using natural language processing
    Oh, Inez Y.
    Schindler, Suzanne E.
    Ghoshal, Nupur
    Lai, Albert M.
    Payne, Philip R. O.
    Gupta, Aditi
    JAMIA OPEN, 2023, 6 (01)
  • [6] Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review
    Sheikhalishahi, Seyedmostafa
    Miotto, Riccardo
    Dudley, Joel T.
    Lavelli, Alberto
    Rinaldi, Fabio
    Osmani, Venet
    JMIR MEDICAL INFORMATICS, 2019, 7 (02) : 15 - 32
  • [7] Clinical Natural Language Processing in languages other than English: opportunities and challenges
    Neveol, Aurelie
    Dalianis, Hercules
    Velupillai, Sumithra
    Savova, Guergana
    Zweigenbaum, Pierre
    JOURNAL OF BIOMEDICAL SEMANTICS, 2018, 9
  • [8] Defining a Patient Population With Cirrhosis: An Automated Algorithm With Natural Language Processing
    Chang, Edward K.
    Yu, Christine Y.
    Clarke, Robin
    Hackbarth, Andrew
    Sanders, Timothy
    Esrailian, Eric
    Hommes, Daniel W.
    Runyon, Bruce A.
    JOURNAL OF CLINICAL GASTROENTEROLOGY, 2016, 50 (10) : 889 - 894
  • [9] SECNLP: A survey of embeddings in clinical natural language processing
    Kalyan, Katikapalli Subramanyam
    Sangeetha, S.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 101 (101)
  • [10] Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology
    Canales, Lea
    Menke, Sebastian
    Marchesseau, Stephanie
    D'Agostino, Ariel
    Del Rio-Bermudez, Carlos
    Taberna, Miren
    Tello, Jorge
    JMIR MEDICAL INFORMATICS, 2021, 9 (07)