Improving Entity Linking in Chinese Domain by Sense Embedding Based on Graph Clustering

被引：0

作者：

Zhao-Bo Zhang

Zhi-Man Zhong

Ping-Peng Yuan

Hai Jin

机构：

[1] National Engineering Research Center for Big Data Technology and System,

[2] Huazhong University of Science and Technology,undefined

[3] Service Computing Technology and System Laboratory,undefined

[4] Huazhong University of Science and Technology,undefined

[5] Cluster and Grid Computing Laboratory,undefined

[6] Huazhong University of Science and Technology,undefined

[7] School of Computer Science and Technology,undefined

[8] Huazhong University of Science and Technology,undefined

来源：

Journal of Computer Science and Technology | 2023年 / 38卷

关键词：

natural language processing (NLP); domain entity linking; computational linguistics; word sense disambiguation; knowledge graph;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Entity linking refers to linking a string in a text to corresponding entities in a knowledge base through candidate entity generation and candidate entity ranking. It is of great significance to some NLP (natural language processing) tasks, such as question answering. Unlike English entity linking, Chinese entity linking requires more consideration due to the lack of spacing and capitalization in text sequences and the ambiguity of characters and words, which is more evident in certain scenarios. In Chinese domains, such as industry, the generated candidate entities are usually composed of long strings and are heavily nested. In addition, the meanings of the words that make up industrial entities are sometimes ambiguous. Their semantic space is a subspace of the general word embedding space, and thus each entity word needs to get its exact meanings. Therefore, we propose two schemes to achieve better Chinese entity linking. First, we implement an n-gram based candidate entity generation method to increase the recall rate and reduce the nesting noise. Then, we enhance the corresponding candidate entity ranking mechanism by introducing sense embedding. Considering the contradiction between the ambiguity of word vectors and the single sense of the industrial domain, we design a sense embedding model based on graph clustering, which adopts an unsupervised approach for word sense induction and learns sense representation in conjunction with context. We test the embedding quality of our approach on classical datasets and demonstrate its disambiguation ability in general scenarios. We confirm that our method can better learn candidate entities’ fundamental laws in the industrial domain and achieve better performance on entity linking through experiments.

引用

页码：196 / 210

页数：14

共 47 条

[1]

Sun CC(2021)Mixed hierarchical networks for deep entity matching Journal of Computer Science and Technology 36 822-838

[2]

Shen DR(2021)Improved entity linking for simple question answering over knowledge graph International Journal of Software Engineering and Knowledge Engineering 31 55-80

[3]

Chen K(2015)Entity linking with a knowledge base: Issues, techniques, and solutions IEEE Trans. Knowledge and Data Engineering 27 443-460

[4]

Shen GH(2022)Towards better entity linking Frontiers of Computer Science 16 162308-45

[5]

Huang ZQ(2017)An approach on Chinese microblog entity linking combining Baidu encyclopaedia and word2vec Procedia Computer Science 111 37-25920

[6]

Wang HJ(2018)Entity linking on Chinese microblogs via deep neural network IEEE Access 6 25908-244

[7]

Shen W(2014)Entity linking meets word sense disambiguation: A unified approach Trans. Association for Computational Linguistics 2 231-1780

[8]

Wang JY(1997)Long short-term memory Neural Computation 9 1735-12665

[9]

Han JW(2021)A lightweight neural model for biomedical entity linking Proceedings of the AAAI Conference on Artificial Intelligence 35 12657-41

[10]

Li MY(1995)WordNet: A lexical database for English Communications of the ACM 38 39-8765

← 1 2 3 4 5 →