Pivot-based Candidate Retrieval for Cross-lingual Entity Linking

被引:0
作者
Liu, Qian [1 ,2 ]
Geng, Xiubo [2 ]
Lu, Jie [1 ]
Jiang, Daxin [2 ]
机构
[1] Univ Technol Sydney, Australian Artificial Intelligence Inst AAII, Sydney, NSW, Australia
[2] Microsoft, STCA NLP Grp, Beijing, Peoples R China
来源
PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) | 2021年
关键词
Information extraction; entity linking; cross-lingual retrieval;
D O I
10.1145/3442381.3449852
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity candidate retrieval plays a critical role in cross-lingual entity linking (XEL). In XEL, entity candidate retrieval needs to retrieve a list of plausible candidate entities from a large knowledge graph in a target language given a piece of text in a sentence or question, namely a mention, in a source language. Existing works mainly fall into two categories: lexicon-based and semantic-based approaches. The lexicon-based approach usually creates cross-lingual and mention-entity lexicons, which is effective but relies heavily on bilingual resources (e.g. inter-language links in Wikipedia). The semantic-based approach maps mentions and entities in different languages to a unified embedding space, which reduces dependence on large-scale bilingual dictionaries. However, its effectiveness is limited by the representation capacity of fixed-length vectors. In this paper, we propose a pivot-based approach which inherits the advantages of the aforementioned two approaches while avoiding their limitations. It takes an intermediary set of plausible target-language mentions as pivots to bridge the two types of gaps: cross-lingual gap and mention-entity gap. Specifically, it first converts mentions in the source language into an intermediary set of plausible mentions in the target language by cross-lingual semantic retrieval and a selective mechanism, and then retrieves candidate entities based on the generated mentions by lexical retrieval. The proposed approach only relies on a small bilingual word dictionary, and fully exploits the benefits of both lexical and semantic matching. Experimental results on two challenging cross-lingual entity linking datasets spanning over 11 languages show that the pivot-based approach outperforms both the lexicon-based and semantic-based approach by a large margin.
引用
收藏
页码:1076 / 1085
页数:10
相关论文
共 39 条
[1]  
[Anonymous], 2017, ABS170208734 CORR, DOI DOI 10.1109/ICCV.2017.325
[2]  
Artetxe M, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P789
[3]  
Cao Y, 2018, INT CONF SOFTW ENG, P670, DOI 10.1109/ICSESS.2018.8663725
[4]  
Chao Xing, 2015, Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation In NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, CO, USA, 31 May 31-5 June 2015, P1006, DOI 10/gnxw6d
[5]  
Chen S, 2020, AAAI CONF ARTIF INTE, V34, P7529
[6]  
Daiber J., 2013, Proceedings of the 9th International Conference on Semantic Systems, P121, DOI [10.1145/2506182.2506198, DOI 10.1145/2506182.2506198]
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]   LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia [J].
Dubey, Mohnish ;
Banerjee, Debayan ;
Abdelkawi, Abdelrahman ;
Lehmann, Jens .
SEMANTIC WEB - ISWC 2019, PT II, 2019, 11779 :69-78
[9]  
Dyer Chris, 2013, P 2013 C N AM CHAPT, P644
[10]  
Ganea Octavian-Eugen, 2017, P 2017 C EMP METH NA, P2619, DOI 10.18653/v1/D17-1277