Towards a theory of tables

被引:14
作者
Hurst, Matthew [1 ]
机构
[1] Nielsen BuzzMetr, New York, NY 10010 USA
关键词
table understanding; information extraction;
D O I
10.1007/s10032-006-0016-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tables appearing in natural language documents provide a compact method for presenting relational information in an immediate and intuitive manner, while simultaneously organizing and indexing that information. Despite their ubiquity and obvious utility, tables have not received the same level of formal characterization enjoyed by sentential text. Rather, they are modeled in terms of geometry, simple hierarchies of strings and database-like relational structures. Tables have been the focus of a large volume of research in the document image analysis field and lately, have received particular attention from researchers interested in extracting information from non-trivial elements of web pages. This paper provides a framework for representing tables at both the semantic and structural levels. It presents a representation of the indexing structures present in tables and the relationship between these structures and the underlying categories.
引用
收藏
页码:123 / 131
页数:9
相关论文
共 21 条
[1]  
ABUTARIF AA, 1998, THESIS RENSSELAER PL
[2]  
[Anonymous], 1999, WORDNET ELECT LEXICA
[3]  
[Anonymous], P 18 INT C COMP LING
[4]  
BIGGERSTAFF TJ, 1984, IEEE COMP SOC
[5]  
CAMERON JP, 1989, 26 OSUCISRC689TR
[6]  
Charniak Eugene, 1993, STAT LANGUAGE LEARNI
[7]  
Davis S. M, 1978, Remote Sensing: The Quantitative Approach
[8]  
FERGUSON D, 1997, P PAP 97
[9]  
GREEN E, 1995, P INT C DOC AN REC, V95, P214
[10]  
HU J, 2000, P 4 ICPR WORKSH DOC