Automatic Normalization of Anatomical Phrases in Radiology Reports Using Unsupervised Learning

被引:10
作者
Tahmasebi, Amir M. [1 ]
Zhu, Henghui [2 ]
Mankovich, Gabriel [1 ]
Prinsen, Peter [3 ]
Klassen, Prescott [1 ]
Pilato, Sam [1 ]
van Ommering, Rob [1 ]
Patel, Pritesh [4 ]
Gunn, Martin L. [5 ]
Chang, Paul [4 ]
机构
[1] Philips Res North Amer, 2 Canal Pk,3rd Floor, Cambridge, MA 02141 USA
[2] Boston Univ, Div Syst Engn, Brookline, MA USA
[3] Philips Res, Eindhoven, North Brabant, Netherlands
[4] Univ Chicago, Med Ctr, Dept Radiol, Chicago, IL 60637 USA
[5] Univ Washington, Dept Radiol, Seattle, WA 98195 USA
关键词
Radiology reports; Concept normalization; Anatomical classification; word2vec; Semantic learning; SNOMED CT; TEXT;
D O I
10.1007/s10278-018-0116-5
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
In today's radiology workflow, free-text reporting is established as the most common medium to capture, store, and communicate clinical information. Radiologists routinely refer to prior radiology reports of a patient to recall critical information for new diagnosis, which is quite tedious, time consuming, and prone to human error. Automatic structuring of report content is desired to facilitate such inquiry of information. In this work, we propose an unsupervised machine learning approach to automatically structure radiology reports by detecting and normalizing anatomical phrases based on the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) ontology. The proposed approach combines word embedding-based semantic learning with ontology-based concept mapping to derive the desired concept normalization. The word embedding model was trained using a large corpus of unlabeled radiology reports. Fifty-six anatomical labels were extracted from SNOMED CT as class labels of the whole human anatomy. The proposed framework was compared against a number of state-of-the-art supervised and unsupervised approaches. Radiology reports from three different clinical sites were manually labeled for testing. The proposed approach outperformed other techniques yielding an average precision of 82.6%. The proposed framework boosts the coverage and performance of conventional approaches for concept normalization, by applying word embedding techniques in semantic learning, while avoiding the challenge of having access to a large amount of annotated data, which is typically required for training classifiers.
引用
收藏
页码:6 / 18
页数:13
相关论文
共 36 条
[1]  
[Anonymous], NORMALISING MED CONC
[2]  
[Anonymous], EVALUATING WORD REPR
[3]  
[Anonymous], 2016, ARXIV160202215
[4]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[5]   Inter-Coder Agreement for Computational Linguistics [J].
Artstein, Ron ;
Poesio, Massimo .
COMPUTATIONAL LINGUISTICS, 2008, 34 (04) :555-596
[6]  
Bengio Y., 2012, P ICML WORKSH UNS TR, P17, DOI DOI 10.1109/IJCNN.2011.6033302
[7]  
Bird S., 2009, Natural language processing with Python: analyzing text with the natural language toolkit
[8]  
Campos D., 2012, Theory Appl. Adv. Text Mining, V11, P175, DOI DOI 10.5772/51066
[9]  
Chiu B., 2016, How to train good word embeddings for biomedical NLP, P166, DOI [DOI 10.18653/V1/W16-2922, 10.18653/v1/w16-2922]
[10]  
Craswell Nick, 2004, TREC, P74