Augmenting Tables by Self-supervised Web Search

被引:0
作者
Loeser, Alexander [1 ]
Nagel, Christoph [1 ]
Pieper, Stephan [1 ]
机构
[1] Tech Univ Berlin, DIMA Grp, D-10587 Berlin, Germany
来源
ENABLING REAL-TIME BUSINESS INTELLIGENCE | 2011年 / 84卷
关键词
information extraction; document collections; query optimization; INFORMATION EXTRACTION;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Often users are faced with the problem of searching the Web for missing values of a spread sheet. It is a fact that today only a few US-based search engines have the capacity to aggregate the wealth of information hidden in Web pages that could be used to return these missing values. Therefore exploiting this information with structured queries, such as join queries, is an often requested, but still unsolved requirement of many Web users. A major challenge in this scenario is identifying keyword queries for retrieving relevant pages from a Web search engine. We solve this challenge by automatically generating keywords. Our approach is based on the observation that Web page authors have already evolved common words and grammatical structures for describing important relationship types. Each keyword query should return only pages that likely contain a missing relation. Therefore our keyword generator continually monitors grammatical structures or lexical phrases from processed Web pages during query execution. Thereby, the keyword generator infers significant and non-ambiguous keywords for retrieving pages which likely match the mechanics of a particular relation extractor. We report an experimental study over multiple relation extractors. Our study demonstrates that our generated keywords efficiently return complete result tuples. In contrast to other approaches we only process very few Web pages.
引用
收藏
页码:84 / 99
页数:16
相关论文
共 16 条
[11]   The YAGO-NAGA approach to knowledge discovery [J].
Max Planck Institute for Informatics, D-66123 Saarbruecken, Germany .
SIGMOD Rec., 2008, 4 (41-47) :41-47
[12]  
Liu J., 2006, WEBDB 2006
[13]  
Loser Alexander, 2009, Web Information Systems Engineering - WISE 2009. Proceedings 10th International Conference, DOI 10.1007/978-3-642-04409-0_3
[14]  
Loser A., 2009, BIRTE WORKSH VLDB LY
[15]  
Loser A., 2008, BIRTE WORKSH VLDB
[16]  
Shen Warren., 2008, SIGMOD C, P1031