Towards the Completion of a Domain-Specific Knowledge Base with Emerging Query Terms

被引：1

作者：

Jiang, Sihang ^{[1
]}

Liang, Jiaqing ^{[1
,4
]}

Xiao, Yanghua ^{[1
,3
]}

Tang, Haihong ^{[2
]}

Huang, Haikuan ^{[2
]}

Tan, Jun ^{[2
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Data Sci, Shanghai, Peoples R China

[2] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China

[3] Shanghai Inst Intelligent Elect & Syst, Shanghai, Peoples R China

[4] Shuyan Technol, Shanghai, Peoples R China

来源：

2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019) | 2019年

基金：

国家重点研发计划;

关键词：

Product Knowledge Base; E-commerce; Knowledge Base Completion; WEB;

D O I：

10.1109/ICDE.2019.00129

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Domain-specific knowledge bases play an increasingly important role in a variety of real applications. In this paper, we use the product knowledge base in the largest Chinese e-commerce platform, Taobao, as an example to investigate a completion procedure of a domain-specific knowledge base. We argue that the domain-specific knowledge bases tend to be incomplete, and are oblivious to their incompleteness, without a continuous completion procedure in place. The key component of this completion procedure is the classification of emerging query terms into corresponding properties of categories in existing taxonomy. Our proposal is that we use query logs to complete the product knowledge base of Taobao. However, the query driven completion usually faces many challenges including distinguishing the fine-grained semantic of unrecognized terms, handling the sparse data and so on. We propose a graph based solution to overcome these challenges. We first construct a lot of positive evidence to establish the semantical similarity between terms, and then run a shortest path or alternatively a random walk on the similarity graph under a set of constraints derived from a set of negative evidence to find the best candidate property for emerging query terms. We finally conduct extensive experiments on real data of Taobao and a subset of CN-DBpedia. The results show that our solution classifies emerging query terms with a good performance. Our solution is already deployed in Taobao, helping it find nearly 7 million new values for properties. The complete product knowledge base significantly improves the ratio of recognized queries and recognized terms by more than 25% and 32%, respectively.

引用

页码：1430 / 1441

页数：12

共 36 条

[1] Andrzejewski David, 2009, Proc Int Conf Mach Learn, V382, P25
[2] [Anonymous], P VLDB ENDOWMENT
[3] [Anonymous], P 2018 ACM C INF KNO
[4] [Anonymous], 2017, 31 AAAI C ART INT
[5] [Anonymous], LD4IE ISWC
[6] Aprosio Alessio Palmero, 2013, NLP DBPEDIA ISWC
[7] DBpedia: A nucleus for a web of open data
Auer, Soeren
Bizer, Christian
Kobilarov, Georgi
Lehmann, Jens
Cyganiak, Richard
Ives, Zachary
[J]. SEMANTIC WEB, PROCEEDINGS, 2007, 4825 : 722 - +
[8] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[9] Christophides V., 2015, SYNTHESIS LECT SEMAN, V5, P1, DOI [DOI 10.2200/S00655ED1V01Y201507WBE013, 10.2200/S00655ED1V01Y201507WBE013]
[10] FIBONACCI HEAPS AND THEIR USES IN IMPROVED NETWORK OPTIMIZATION ALGORITHMS
FREDMAN, ML
TARJAN, RE
[J]. JOURNAL OF THE ACM, 1987, 34 (03) : 596 - 615

← 1 2 3 4 →