Knowledge Base Completion via Search-Based Question Answering

被引:149
作者
West, Robert [1 ]
Gabrilovich, Evgeniy [2 ]
Murphy, Kevin [2 ]
Sun, Shaohua [2 ]
Gupta, Rahul [2 ]
Lin, Dekang [2 ]
机构
[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA
[2] Google, 1600 Amphitheatre Pkwy, Mountain View, CA 94043 USA
来源
WWW'14: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB | 2014年
关键词
Freebase; slot filling; information extraction;
D O I
10.1145/2566486.2568032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the past few years, massive amounts of world knowledge have been accumulated in publicly available knowledge bases, such as Freebase, NELL, and YAGO. Yet despite their seemingly huge size, these knowledge bases are greatly incomplete. For example, over 70% of people included in Freebase have no known place of birth, and 99% have no known ethnicity. In this paper, we propose a way to leverage existing Web-search-based question-answering technology to fill in the gaps in knowledge bases in a targeted way. In particular, for each entity attribute, we learn the best set of queries to ask, such that the answer snippets returned by the search engine are most likely to contain the correct value for that attribute. For example, if we want to find Frank Zappa's mother, we could ask the query who is the mother of Frank Zappa. However, this is likely to return 'The Mothers of Invention', which was the name of his band. Our system learns that it should (in this case) add disambiguating terms, such as Zappa's place of birth, in order to make it more likely that the search results contain snippets mentioning his mother. Our system also learns how many different queries to ask for each attribute, since in some cases, asking too many can hurt accuracy (by introducing false positives). We discuss how to aggregate candidate answers across multiple queries, ultimately returning probabilistic predictions for possible values for each attribute. Finally, we evaluate our system and show that it is able to extract a large number of facts with high confidence.
引用
收藏
页码:515 / 525
页数:11
相关论文
共 21 条
[1]  
[Anonymous], 1999, P 8 TEXT RETR C
[2]  
[Anonymous], 2010, P 23 INT C COMP LING
[3]  
[Anonymous], 2007, SEMANT WEB
[4]  
[Anonymous], 2011, ACL
[5]  
Bollacker K., 2008, P 2008 ACM SIGMOD IN, P1247, DOI DOI 10.1145/1376616.1376746
[6]  
Byrne L., 2010, P 3 TEXT AN C TAC
[7]  
Carlson A., 2010, AAAI, V5, P3
[8]  
Collins-Thompson K., 2004, P 27 INT ACM SIGIR C
[9]  
Dong X., 2014, KNOWLEDGE VAULT WEB
[10]  
Han Xianpei, 2011, P 34 INT ACM SIGIR C