Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization

被引:14
作者
Xu, Dongfang [1 ]
Gopale, Manoj [2 ]
Zhang, Jiacheng [3 ]
Brown, Kris [4 ]
Begoli, Edmon [4 ]
Bethard, Steven [1 ]
机构
[1] Univ Arizona, Sch Informat, 1103 E 2nd St,Harvill Bldg,Rm 437D, Tucson, AZ 85721 USA
[2] Univ Arizona, Dept Elect & Comp Engn, Tucson, AZ 85721 USA
[3] Univ Arizona, Dept Comp Sci, Tucson, AZ 85721 USA
[4] Oak Ridge Natl Lab, Natl Ctr Computat Sci, Oak Ridge, TN USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
deep learning; unified medical language system; natural language processing; concept normalization; generate-and-rank; WORD SENSE DISAMBIGUATION; CLINICAL TEXT; TERMS;
D O I
10.1093/jamia/ocaa080
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-andrank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization. Materials and Methods: The shared task provided 13 609 concept mentions drawn from 100 discharge summaries. We first design a sieve-based system that uses Lucene indices over the training data, Unified Medical Language System (UMLS) preferred terms, and UMLS synonyms to generate a list of possible concepts for each mention. We then design a listwise classifier based on the BERT (Bidirectional Encoder Representations from Transformers) neural network to rank the candidate concepts, integrating UMLS semantic types through a regularizer. Results: Our generate-and-rank system was third of 33 in the competition, outperforming the candidate generator alone (81.66% vs 79.44%) and the previous state of the art (76.35%). During postevaluation, the model's accuracy was increased to 83.56% via improvements to how training data are generated from UMLS and incorporation of our UMLS semantic type regularizer. Discussion: Analysis of the model shows that prioritizing UMLS preferred terms yields better performance, that the UMLS semantic type regularizer results in qualitatively better concept predictions, and that the model performs well even on concepts not seen during training. Conclusions: Our generate-and-rank framework for UMLS concept normalization integrates key UMLS features like preferred terms and semantic types with a neural network-based ranking model to accurately link phrases in text to UMLS concepts.
引用
收藏
页码:1510 / 1519
页数:10
相关论文
共 48 条
[1]  
Alsentzer Emily, 2019, P 2 CLIN NATURAL LAN, P72, DOI DOI 10.18653/V1/W19-1909
[2]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[3]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[4]  
Cao Z., 2007, P 24 INT C MACHINE L, P129
[5]   Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions [J].
Chapman, Wendy W. ;
Nadkarni, Prakash M. ;
Hirschman, Lynette ;
D'Avolio, Leonard W. ;
Savova, Guergana K. ;
Uzuner, Ozlem .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (05) :540-543
[6]  
D'Souza J, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, P297
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]   Word Sense Disambiguation of Medical Terms via Recurrent Convolutional Neural Networks [J].
Festag, Sven ;
Spreckelsen, Cord .
HEALTH INFORMATICS MEETS EHEALTH: DIGITAL INSIGHT - INFORMATION-DRIVEN HEALTH & CARE, 2017, 236 :8-15
[9]   Application of text mining in the biomedical domain [J].
Fleuren, Wilco W. M. ;
Alkema, Wynand .
METHODS, 2015, 74 :97-106
[10]   Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery [J].
Gonzalez, Graciela H. ;
Tahsin, Tasnia ;
Goodale, Britton C. ;
Greene, Anna C. ;
Greene, Casey S. .
BRIEFINGS IN BIOINFORMATICS, 2016, 17 (01) :33-42