Pivot-based Candidate Retrieval for Cross-lingual Entity Linking

被引：0

作者：

Liu, Qian ^{[1
,2
]}

Geng, Xiubo ^{[2
]}

Lu, Jie ^{[1
]}

Jiang, Daxin ^{[2
]}

机构：

[1] Univ Technol Sydney, Australian Artificial Intelligence Inst AAII, Sydney, NSW, Australia

[2] Microsoft, STCA NLP Grp, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) | 2021年

关键词：

Information extraction; entity linking; cross-lingual retrieval;

D O I：

10.1145/3442381.3449852

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Entity candidate retrieval plays a critical role in cross-lingual entity linking (XEL). In XEL, entity candidate retrieval needs to retrieve a list of plausible candidate entities from a large knowledge graph in a target language given a piece of text in a sentence or question, namely a mention, in a source language. Existing works mainly fall into two categories: lexicon-based and semantic-based approaches. The lexicon-based approach usually creates cross-lingual and mention-entity lexicons, which is effective but relies heavily on bilingual resources (e.g. inter-language links in Wikipedia). The semantic-based approach maps mentions and entities in different languages to a unified embedding space, which reduces dependence on large-scale bilingual dictionaries. However, its effectiveness is limited by the representation capacity of fixed-length vectors. In this paper, we propose a pivot-based approach which inherits the advantages of the aforementioned two approaches while avoiding their limitations. It takes an intermediary set of plausible target-language mentions as pivots to bridge the two types of gaps: cross-lingual gap and mention-entity gap. Specifically, it first converts mentions in the source language into an intermediary set of plausible mentions in the target language by cross-lingual semantic retrieval and a selective mechanism, and then retrieves candidate entities based on the generated mentions by lexical retrieval. The proposed approach only relies on a small bilingual word dictionary, and fully exploits the benefits of both lexical and semantic matching. Experimental results on two challenging cross-lingual entity linking datasets spanning over 11 languages show that the pivot-based approach outperforms both the lexicon-based and semantic-based approach by a large margin.

引用

页码：1076 / 1085

页数：10

共 39 条

[21] Cross-lingual Name Tagging and Linking for 282 Languages [J].

Pan, Xiaoman ;

Zhang, Boliang ;

May, Jonathan ;

Nothman, Joel ;

Knight, Kevin ;

Ji, Heng .

PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1946-1958

[22]

Le P, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P1595

[23] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].

Ren, Shaoqing ;

He, Kaiming ;

Girshick, Ross ;

Sun, Jian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149

[24]

Rijhwani S, 2019, AAAI CONF ARTIF INTE, P6924

[25] The probabilistic relevance framework: BM25 and beyond [J].

Robertson, Stephen ;

Zaragoza, Hugo .

Foundations and Trends in Information Retrieval, 2009, 3 (04) :333-389

[26] EDGE AND CURVE DETECTION FOR VISUAL SCENE ANALYSIS [J].

ROSENFELD, A ;

THURSTON, M .

IEEE TRANSACTIONS ON COMPUTERS, 1971, C 20 (05) :562-+

[27]

Scaiella Ugo, 2010, CIKM, P1625, DOI [10.1145/1871437.1871689, 10.1145/1871437., DOI 10.1145/1871437.1871689]

[28]

Sil A, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P2255

[29]

Sil A, 2018, AAAI CONF ARTIF INTE, P5464

[30]

Tsai CT, 2018, AAAI CONF ARTIF INTE, P5528

← 1 2 3 4 →