Challenges in clinical natural language processing for automated disorder normalization

被引:95
作者
Leaman, Robert [1 ]
Khare, Ritu [1 ]
Lu, Zhiyong [1 ]
机构
[1] NIH, NCBI, NLM, Bethesda, MD 20894 USA
关键词
Natural language processing; Electronic health records; Information extraction; ELECTRONIC HEALTH RECORDS; TEXT; UMLS;
D O I
10.1016/j.jbi.2015.07.010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or grounding) in clinical narratives than in biomedical publications. In this work, we aim to identify the cause for this performance difference and introduce general solutions. Methods: We use closure properties to compare the richness of the vocabulary in clinical narrative text to biomedical publications. We approach both disorder NER and normalization using machine learning methodologies. Our NER methodology is based on linear-chain conditional random fields with a rich feature approach, and we introduce several improvements to enhance the lexical knowledge of the NER system. Our normalization method - never previously applied to clinical data - uses pairwise learning to rank to automatically learn term variation directly from the training data. Results: We find that while the size of the overall vocabulary is similar between clinical narrative and biomedical publications, clinical narrative uses a richer terminology to describe disorders than publications. We apply our system, DNorm-C, to locate disorder mentions and in the clinical narratives from the recent ShARe/CLEF eHealth Task. For NER (strict span-only), our system achieves precision = 0.797, recall = 0.713, f-score = 0.753. For the normalization task (strict span + concept) it achieves precision = 0.712, recall = 0.637, f-score = 0.672. The improvements described in this article increase the NER f-score by 0.039 and the normalization f-score by 0.036. We also describe a high recall version of the NER, which increases the normalization recall to as high as 0.744, albeit with reduced precision. Discussion: We perform an error analysis, demonstrating that NER errors outnumber normalization errors by more than 4-to-1. Abbreviations and acronyms are found to be frequent causes of error, in addition to the mentions the annotators were not able to identify within the scope of the controlled vocabulary. Conclusion: Disorder mentions in text from clinical narratives use a rich vocabulary that results in high term variation, which we believe to be one of the primary causes of reduced performance in clinical narrative. We show that pairwise learning to rank offers high performance in this context, and introduce several lexical enhancements - generalizable to other clinical NER tasks - that improve the ability of the NER system to handle this variation. DNorm-C is a high performing, open source system for disorders in clinical text, and a promising step toward NER and normalization methods that are trainable to a wide variety of domains and entities. (DNorm-C is open source software, and is available with a trained model at the DNorm demonstration website: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/ #DNorm.) Published by Elsevier Inc.
引用
收藏
页码:28 / 37
页数:10
相关论文
共 50 条
  • [41] Automated scoring of the autobiographical interview with natural language processing
    Ruben D.I. van Genugten
    Daniel L. Schacter
    Behavior Research Methods, 2024, 56 : 2243 - 2259
  • [42] Automated Grading System using Natural Language Processing
    Rokade, Amit
    Patil, Bhushan
    Rajani, Sana
    Revandkar, Surabhi
    Shedge, Rajashree
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 1123 - 1127
  • [43] Natural Language Processing in Oncology A Review
    Yim, Wen-wai
    Yetisgen, Meliha
    Harris, William P.
    Kwan, Sharon W.
    JAMA ONCOLOGY, 2016, 2 (06) : 797 - 804
  • [44] Automated extraction of sudden cardiac death risk factors in hypertrophic cardiomyopathy patients by natural language processing
    Moon, Sungrim
    Liu, Sijia
    Scott, Christopher G.
    Samudrala, Sujith
    Abidian, Mohamed M.
    Geske, Jeffrey B.
    Noseworthy, Peter A.
    Shellum, Jane L.
    Chaudhry, Rajeev
    Ommen, Steve R.
    Nishimura, Rick A.
    Liu, Hongfang
    Arruda-Olson, Adelaide M.
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 128 : 32 - 38
  • [45] Automated detection of cervical spondylotic myelopathy: harnessing the power of natural language processing
    Ren, GuanRui
    Wang, PeiYang
    Wang, ZhiWei
    Xie, ZhiYang
    Liu, Lei
    Wang, YunTao
    Wu, XiaoTao
    FRONTIERS IN NEUROSCIENCE, 2025, 19
  • [46] Interface terminology: Natural language processing of clinical data in Electronic Health Record narratives
    de Souza, Amanda Damasceno
    Correa, Fabio
    de Araujo Nery Ribeiro, Jurema Suely
    de Carvalho Dutra, Frederico Giffoni
    da Silva, Helton Junio
    Felipe, Eduardo Ribeiro
    ENCONTROS BIBLI-REVISTA ELETRONICA DE BIBLIOTECONOMIA E CIENCIA DA INFORMACAO, 2024, 29
  • [47] Clinical Decision Support With Natural Language Processing Facilitates Determination of Colonoscopy Surveillance Intervals
    Imler, Timothy D.
    Morea, Justin
    Imperiale, Thomas F.
    CLINICAL GASTROENTEROLOGY AND HEPATOLOGY, 2014, 12 (07) : 1130 - 1136
  • [48] Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review
    Kreimeyer, Kory
    Foster, Matthew
    Pandey, Abhishek
    Arya, Nina
    Halford, Gwendolyn
    Jones, Sandra F.
    Forshee, Richard
    Walderhaug, Mark
    Botsis, Taxiarchis
    JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 73 : 14 - 29
  • [49] Cognitive Impairments in Schizophrenia: A Study in a Large Clinical Sample Using Natural Language Processing
    Mascio, Aurelie
    Stewart, Robert
    Botelle, Riley
    Williams, Marcus
    Mirza, Luwaiza
    Patel, Rashmi
    Pollak, Thomas
    Dobson, Richard
    Roberts, Angus
    FRONTIERS IN DIGITAL HEALTH, 2021, 3
  • [50] NLPReViz: an interactive tool for natural language processing on clinical text
    Trivedi, Gaurav
    Phuong Pham
    Chapman, Wendy W.
    Hwa, Rebecca
    Wiebe, Janyce
    Hochheiser, Harry
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (01) : 81 - 87