Column-specific Context Extraction for Web Tables

被引:9
作者
Braunschweig, Katrin [1 ]
Thiele, Maik [1 ]
Eberius, Julian [1 ]
Lehner, Wolfgang [1 ]
机构
[1] Tech Univ Dresden, Dresden, Germany
来源
30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II | 2015年
关键词
Information Extraction;
D O I
10.1145/2695664.2695794
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Relational Web tables have become an important resource for applications such as factual search and entity augmentation. A major challenge for an automatic identification of relevant tables on the Web is the fact that many of these tables have missing or non-informative column labels. Research has focused largely on recovering the meaning of columns by inferring class labels from the instances using external knowledge bases. The table context, which often contains additional information on the table's content, is frequently considered as an indicator for the general content of a table, but not as a source for column-specific details. In this paper, we propose a novel approach to identify and extract column-specific information from the context of Web tables. In our extraction framework, we consider different techniques to extract directly as well as indirectly related phrases. We perform a number of experiments on Web tables extracted from Wikipedia. The results show that column-specific information extracted using our simple heuristic significantly boost precision and recall for table and column search.
引用
收藏
页码:1072 / 1077
页数:6
相关论文
共 14 条
[1]  
[Anonymous], 2011, 20 INT C WORLD WIDE
[2]  
Cafarella MJ, 2009, Proceedings of the VLDB Endowment, V2, P1090, DOI [10.14778/1687627.1687750, DOI 10.14778/1687627.1687750]
[3]  
EBERIUS J, 2012, VLDB ENDOWMENT, V5, P1978
[4]  
Jingjing Wang, 2012, Conceptual Modeling. Proceedings 31st International Conference, ER 2012, P141, DOI 10.1007/978-3-642-34002-4_11
[5]   Annotating and Searching Web Tables Using Entities, Types and Relationships [J].
Limaye, Girija ;
Sarawagi, Sunita ;
Chakrabarti, Soumen .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01) :1338-1347
[6]  
Ling X., 2013, IJCAI
[7]   WORDNET - A LEXICAL DATABASE FOR ENGLISH [J].
MILLER, GA .
COMMUNICATIONS OF THE ACM, 1995, 38 (11) :39-41
[8]  
Mulwad V., 2011, P 1 INT WORKSH SEARC, P17
[9]   Answering Table Queries on the Web using Column Keywords [J].
Pimplikar, Rakesh ;
Sarawagi, Sunita .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (10) :908-919
[10]  
Socher R., 2013, ACL 13