Context Embedding Based on Bi-LSTM in Semi-Supervised Biomedical Word Sense Disambiguation

被引:19
作者
Li, Zhi [1 ,2 ]
Yang, Fan [3 ]
Luo, Yaoru [1 ]
机构
[1] Univ Sichuan, Coll Elect & Informat Engn, Chengdu 610065, Sichuan, Peoples R China
[2] Univ Sichuan, Key Lab Wireless Power Transmiss, Minist Educ, Chengdu 610065, Sichuan, Peoples R China
[3] Univ Sichuan, West China Hosp 2, Key Lab Obstetr & Gynecol & Pediat Dis & Birth De, Minist Educ,Dept Gynecol & Obstet, Chengdu 610041, Sichuan, Peoples R China
关键词
Word sense disambiguation; semi-supervised learning; context embedding; biomedical domain;
D O I
10.1109/ACCESS.2019.2912584
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Word sense disambiguation (WSD) is a basic task of natural language processing (NLP) and its purpose to choose the correct sense of an ambiguous word according to its context. In biomedical WSD, recent research has used context embeddings built by concatenating or averaging word embeddings to represent the sense of a context. These simple linear operations on neighbor words ignore the information about the sequence and may cause their models to be flawed in semantic representation. In this paper, we present a novel language model based on Bi-LSTM to embed an entire sentential context in continuous space by taking account of word order. We demonstrate that our language model can generate high-quality context representations in an unsupervised manner. Unlike the previous work that directly predicts the word senses, our model classifies a word in a context by building sense embeddings and this helps us set a new state-of-the-art result (macro/micro average) on both MSH and NLM datasets. In addition, with the same language model, we propose semi-supervised learning based on label propagation (LP) to reduce the dependence on biomedical data. The results show that this method can nearly approach the state-of-the-art results produced by our Bi-LSTM when reducing the labeled training data.
引用
收藏
页码:72928 / 72935
页数:8
相关论文
共 40 条
[1]   RandomWalks for Knowledge- Based Word Sense Disambiguation [J].
Agirre, Eneko ;
Lopez de Lacalle, Oier ;
Soroa, Aitor .
COMPUTATIONAL LINGUISTICS, 2014, 40 (01) :57-84
[2]  
Akbari M., 2018, P ICWSM, P118
[3]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[4]  
Brody S., 2009, Proc. of EACL, P103, DOI DOI 10.3115/1609067.1609078
[5]  
Camacho-Collados J, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P741
[6]   Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods [J].
Chasin, Rachel ;
Rumshisky, Anna ;
Uzuner, Ozlem ;
Szolovitsl, Peter .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (05) :842-849
[7]   Unsupervised ensemble ranking of terms in electronic health record notes based on their importance to patients [J].
Chen, Jinying ;
Yu, Hong .
JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 68 :121-131
[8]  
Chen Xinxiong, 2014, P 2014 C EMPIRICAL M, P1025
[9]   Clustering and DiversifyingWeb Search Results with Graph-Based Word Sense Induction [J].
Di Marco, Antonio ;
Navigli, Roberto .
COMPUTATIONAL LINGUISTICS, 2013, 39 (03) :709-754
[10]  
Hochreiter S, 1997, Neural Computation, V9, P1735