Towards the Completion of a Domain-Specific Knowledge Base with Emerging Query Terms

被引:1
作者
Jiang, Sihang [1 ]
Liang, Jiaqing [1 ,4 ]
Xiao, Yanghua [1 ,3 ]
Tang, Haihong [2 ]
Huang, Haikuan [2 ]
Tan, Jun [2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Data Sci, Shanghai, Peoples R China
[2] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China
[3] Shanghai Inst Intelligent Elect & Syst, Shanghai, Peoples R China
[4] Shuyan Technol, Shanghai, Peoples R China
来源
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019) | 2019年
基金
国家重点研发计划;
关键词
Product Knowledge Base; E-commerce; Knowledge Base Completion; WEB;
D O I
10.1109/ICDE.2019.00129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Domain-specific knowledge bases play an increasingly important role in a variety of real applications. In this paper, we use the product knowledge base in the largest Chinese e-commerce platform, Taobao, as an example to investigate a completion procedure of a domain-specific knowledge base. We argue that the domain-specific knowledge bases tend to be incomplete, and are oblivious to their incompleteness, without a continuous completion procedure in place. The key component of this completion procedure is the classification of emerging query terms into corresponding properties of categories in existing taxonomy. Our proposal is that we use query logs to complete the product knowledge base of Taobao. However, the query driven completion usually faces many challenges including distinguishing the fine-grained semantic of unrecognized terms, handling the sparse data and so on. We propose a graph based solution to overcome these challenges. We first construct a lot of positive evidence to establish the semantical similarity between terms, and then run a shortest path or alternatively a random walk on the similarity graph under a set of constraints derived from a set of negative evidence to find the best candidate property for emerging query terms. We finally conduct extensive experiments on real data of Taobao and a subset of CN-DBpedia. The results show that our solution classifies emerging query terms with a good performance. Our solution is already deployed in Taobao, helping it find nearly 7 million new values for properties. The complete product knowledge base significantly improves the ratio of recognized queries and recognized terms by more than 25% and 32%, respectively.
引用
收藏
页码:1430 / 1441
页数:12
相关论文
共 36 条
  • [1] Andrzejewski David, 2009, Proc Int Conf Mach Learn, V382, P25
  • [2] [Anonymous], P VLDB ENDOWMENT
  • [3] [Anonymous], P 2018 ACM C INF KNO
  • [4] [Anonymous], 2017, 31 AAAI C ART INT
  • [5] [Anonymous], LD4IE ISWC
  • [6] Aprosio Alessio Palmero, 2013, NLP DBPEDIA ISWC
  • [7] DBpedia: A nucleus for a web of open data
    Auer, Soeren
    Bizer, Christian
    Kobilarov, Georgi
    Lehmann, Jens
    Cyganiak, Richard
    Ives, Zachary
    [J]. SEMANTIC WEB, PROCEEDINGS, 2007, 4825 : 722 - +
  • [8] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [9] Christophides V., 2015, SYNTHESIS LECT SEMAN, V5, P1, DOI [DOI 10.2200/S00655ED1V01Y201507WBE013, 10.2200/S00655ED1V01Y201507WBE013]
  • [10] FIBONACCI HEAPS AND THEIR USES IN IMPROVED NETWORK OPTIMIZATION ALGORITHMS
    FREDMAN, ML
    TARJAN, RE
    [J]. JOURNAL OF THE ACM, 1987, 34 (03) : 596 - 615