Named Entity Recognition in Unstructured Medical Text Documents

被引:3
作者
Pearson, Cole [1 ]
Seliya, Naeem [1 ]
Dave, Rushit [1 ]
机构
[1] Univ Wisconsin, Dept Comp Sci, Eau Claire, WI 54701 USA
来源
INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021) | 2021年
关键词
named entity recognition; de-identification; independent medical examination; natural language processing; machine learning; DE-IDENTIFICATION; HEALTH RECORDS;
D O I
10.1109/ICECET52533.2021.9698694
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Physicians provide expert opinion to legal courts on the medical state of patients, including determining if a patient is likely to have permanent or non-permanent injuries or ailments. An independent medical examination (IME) report summarizes a physician's medical opinion about a patient's health status based on the physician's expertise. IME reports contain private and sensitive information (Personally Identifiable Information or PII) that needs to be removed or randomly encoded before further research work can be conducted. In our study the IME is an orthopedic surgeon from a private practice in the United States. The goal of this research is to perform named entity recognition (NER) to identify and subsequently remove/encode PII information from IME reports prepared by the physician. We apply the NER toolkits of OpenNLP and spaCy, two freely available natural language processing platforms, and compare their precision, recall, and f-measure performance at identifying five categories of PII across trials of randomly selected IME reports using each model's common default parameters. We find that both platforms achieve high performance (f-measure > 0.9) at de-identification and that a spaCy model trained with a 70-30 train-test data split is most performant.
引用
收藏
页码:412 / 417
页数:6
相关论文
共 33 条
[1]   Applications of Recurrent Neural Network for Biometric Authentication & Anomaly Detection [J].
Ackerson, Joseph M. ;
Dave, Rushit ;
Seliya, Naeem .
INFORMATION, 2021, 12 (07)
[2]   On the Use of Parsing for Named Entity Recognition [J].
Alonso, Miguel A. ;
Gomez-Rodriguez, Carlos ;
Vilares, Jesus .
APPLIED SCIENCES-BASEL, 2021, 11 (03) :1-24
[3]  
Apache Software Foundation, 2021, AP POI JAV API MICR
[4]  
Apache Software Foundation, 2020, AP KAFK DOC
[5]   A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine [J].
Campillos-Llanos, Leonardo ;
Valverde-Mateos, Ana ;
Capllonch-Carrion, Adrian ;
Moreno-Sandoval, Antonio .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)
[6]   Low-Resource Named Entity Recognition via the Pre-Training Model [J].
Chen, Siqi ;
Pei, Yijie ;
Ke, Zunwang ;
Silamu, Wushour .
SYMMETRY-BASEL, 2021, 13 (05)
[7]   Clinical Relevance of Pharmacist Intervention: Development of a Named Entity Recognition Model on Unstructured Comments [J].
Clarenne, Justine ;
Priou, Sonia ;
Alixe, Aymeric ;
Martin, Olivier ;
Mongaret, Celine ;
Bedouch, Pierrick .
PUBLIC HEALTH AND INFORMATICS, PROCEEDINGS OF MIE 2021, 2021, 281 :492-493
[8]   De-identification of patient notes with recurrent neural networks [J].
Dernoncourt, Franck ;
Lee, Ji Young ;
Uzuner, Ozlem ;
Szolovits, Peter .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (03) :596-606
[9]   De-identification of unstructured paper-based health records for privacy-preserving secondary use [J].
Fenz, Stefan ;
Heurix, Johannes ;
Neubauer, Thomas ;
Rella, Antonio .
Journal of Medical Engineering and Technology, 2014, 38 (05) :260-268
[10]  
Gao S., 2021, PLOS ONE, V16, P1