Pivot-based Candidate Retrieval for Cross-lingual Entity Linking

被引:0
作者
Liu, Qian [1 ,2 ]
Geng, Xiubo [2 ]
Lu, Jie [1 ]
Jiang, Daxin [2 ]
机构
[1] Univ Technol Sydney, Australian Artificial Intelligence Inst AAII, Sydney, NSW, Australia
[2] Microsoft, STCA NLP Grp, Beijing, Peoples R China
来源
PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) | 2021年
关键词
Information extraction; entity linking; cross-lingual retrieval;
D O I
10.1145/3442381.3449852
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity candidate retrieval plays a critical role in cross-lingual entity linking (XEL). In XEL, entity candidate retrieval needs to retrieve a list of plausible candidate entities from a large knowledge graph in a target language given a piece of text in a sentence or question, namely a mention, in a source language. Existing works mainly fall into two categories: lexicon-based and semantic-based approaches. The lexicon-based approach usually creates cross-lingual and mention-entity lexicons, which is effective but relies heavily on bilingual resources (e.g. inter-language links in Wikipedia). The semantic-based approach maps mentions and entities in different languages to a unified embedding space, which reduces dependence on large-scale bilingual dictionaries. However, its effectiveness is limited by the representation capacity of fixed-length vectors. In this paper, we propose a pivot-based approach which inherits the advantages of the aforementioned two approaches while avoiding their limitations. It takes an intermediary set of plausible target-language mentions as pivots to bridge the two types of gaps: cross-lingual gap and mention-entity gap. Specifically, it first converts mentions in the source language into an intermediary set of plausible mentions in the target language by cross-lingual semantic retrieval and a selective mechanism, and then retrieves candidate entities based on the generated mentions by lexical retrieval. The proposed approach only relies on a small bilingual word dictionary, and fully exploits the benefits of both lexical and semantic matching. Experimental results on two challenging cross-lingual entity linking datasets spanning over 11 languages show that the pivot-based approach outperforms both the lexicon-based and semantic-based approach by a large margin.
引用
收藏
页码:1076 / 1085
页数:10
相关论文
共 39 条
[21]   Cross-lingual Name Tagging and Linking for 282 Languages [J].
Pan, Xiaoman ;
Zhang, Boliang ;
May, Jonathan ;
Nothman, Joel ;
Knight, Kevin ;
Ji, Heng .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1946-1958
[22]  
Le P, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P1595
[23]   Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].
Ren, Shaoqing ;
He, Kaiming ;
Girshick, Ross ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149
[24]  
Rijhwani S, 2019, AAAI CONF ARTIF INTE, P6924
[25]   The probabilistic relevance framework: BM25 and beyond [J].
Robertson, Stephen ;
Zaragoza, Hugo .
Foundations and Trends in Information Retrieval, 2009, 3 (04) :333-389
[26]   EDGE AND CURVE DETECTION FOR VISUAL SCENE ANALYSIS [J].
ROSENFELD, A ;
THURSTON, M .
IEEE TRANSACTIONS ON COMPUTERS, 1971, C 20 (05) :562-+
[27]  
Scaiella Ugo, 2010, CIKM, P1625, DOI [10.1145/1871437.1871689, 10.1145/1871437., DOI 10.1145/1871437.1871689]
[28]  
Sil A, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P2255
[29]  
Sil A, 2018, AAAI CONF ARTIF INTE, P5464
[30]  
Tsai CT, 2018, AAAI CONF ARTIF INTE, P5528