Measuring Semantic Relatedness with Knowledge Association Network

被引：2

作者：

Li, Jiapeng ^{[1
,2
]}

Chen, Wei ^{[1
]}

Gu, Binbin ^{[1
]}

Fang, Junhua ^{[1
]}

Li, Zhixu ^{[1
,3
]}

Zhao, Lei ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China

[2] Inst Elect & Informat Engn UESTC Guangdong, Dongguan 523808, Guangdong, Peoples R China

[3] IFLYTEK Res, Suzhou, Peoples R China

来源：

DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT I | 2019年 / 11446卷

基金：

中国国家自然科学基金;

关键词：

Semantic relatedness; Knowledge graph; Network embedding;

D O I：

10.1007/978-3-030-18576-3_40

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Measuring semantic relatedness between two words is a fundamental task for many applications in both databases and natural language processing domains. Conventional methods mainly utilize the latent semantic information hidden in lexical databases (WordNet) or text corpus (Wikipedia). They have made great achievements based on the distance computation in lexical tree or co-occurrence principle in Wikipedia. However these methods suffer from low coverage and low precision because (1) lexical database contains abundant lexical information but lacks semantic information; (2) in Wikipedia, two related words (e.g. synonyms) may not appear in a window size or a sentence, and unrelated ones may be mentioned together by chance. To compute semantic relatedness more accurately, some other approaches have made great efforts based on free association network and achieved a significant improvement on relatedness measurement. Nevertheless, they need complex preprocessing in Wikipedia. Besides, the fixed score functions they adopt cause the lack of flexibility and expressiveness of model. In this paper, we leverage DBPedia and Wikipedia to construct a Knowledge Association Network (KAN) which avoids the information extraction of Wikipedia. We propose a flexible and expressive model to represent entities behind the words, in which attribute and topological structure information of entities are embedded in vector space simultaneously. The experiment results based on standard datasets show the better effectiveness of our model compared to previous models.

引用

页码：676 / 691

页数：16

共 28 条

[1] Agirre E., 2009, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, P19
[2] [Anonymous], 2006, AAAI
[3] [Anonymous], 2011, AAAI
[4] [Anonymous], 2008, P AAAI WORKSH WIK AR
[5] Bordes Antoine, 2013, ADV NEURAL INF PROCE, P2787
[6] Fan J, 2014, PROC INT CONF DATA, P976, DOI 10.1109/ICDE.2014.6816716
[7] Gabrilovich E, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1606
[8] Gong X., 2018, AAAI
[9] node2vec: Scalable Feature Learning for Networks
Grover, Aditya
Leskovec, Jure
[J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 855 - 864
[10] Han XP, 2010, ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, P50

← 1 2 3 →