Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation

被引:0
作者
Cheng, Jinfeng [1 ]
Tong, Weiqin [1 ,2 ]
Yan, Weian [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai 200444, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 06期
关键词
word sense disambiguation; multi-head attention; capsule network; capsule routing;
D O I
10.3390/app11062488
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Word sense disambiguation (WSD) is one of the core problems in natural language processing (NLP), which is to map an ambiguous word to its correct meaning in a specific context. There has been a lively interest in incorporating sense definition (gloss) into neural networks in recent studies, which makes great contribution to improving the performance of WSD. However, disambiguating polysemes of rare senses is still hard. In this paper, while taking gloss into consideration, we further improve the performance of the WSD system from the perspective of semantic representation. We encode the context and sense glosses of the target polysemy independently using encoders with the same structure. To obtain a better presentation in each encoder, we leverage the capsule network to capture different important information contained in multi-head attention. We finally choose the gloss representation closest to the context representation of the target word as its correct sense. We do experiments on English all-words WSD task. Experimental results show that our method achieves good performance, especially having an inspiring effect on disambiguating words of rare senses.
引用
收藏
页数:14
相关论文
共 43 条
  • [1] [Anonymous], 2014, 14090473 ARXIV
  • [2] [Anonymous], 2016, COGALEX COLING
  • [3] Banerjee S., 2002, Computational Linguistics and Intelligent Text Processing. Third International Conference, CICLing 2002. Proceedings (Lecture Notes in Computer Science Vol.2276), P136
  • [4] Basile P., 2014, An enhanced lesk word sense disambiguation algorithm through a distributional semantic model, P1591
  • [5] Chan Y.S., 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, P33
  • [6] Chaplot DS, 2018, AAAI CONF ARTIF INTE, P5062
  • [7] Chen H., 2018, 181211321 ARXIV
  • [8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [9] Edmonds P., 2001, P 2 INT WORKSH EV WO, P1
  • [10] Improving Multi-head Attention with Capsule Networks
    Gu, Shuhao
    Feng, Yang
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 314 - 326