Extracting Contextualized Quantity Facts from Web Tables

被引:11
|
作者
Ho, Vinh Thinh [1 ]
Pal, Koninika [1 ]
Razniewski, Simon [1 ]
Berberich, Klaus [1 ,2 ]
Weikum, Gerhard [1 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] Htw Saar, Saarbrucken, Germany
来源
PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) | 2021年
关键词
Information Extraction; Quantity Facts; Web Tables;
D O I
10.1145/3442381.3450072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quantity queries, with filter conditions on quantitative measures of entities, are beyond the functionality of search engines and QA assistants. To enable such queries over web contents, this paper develops a novel method for automatically extracting quantity facts from ad-hoc web tables. This involves recognizing quantities, with normalized values and units, aligning them with the proper entities, and contextualizing these pairs with informative cues to match sophisticated queries with modifiers. Our method includes a new approach to aligning quantity columns to entity columns. Prior works assumed a single subject-column per table, whereas our approach is geared for complex tables and leverages external corpora as evidence. For contextualization, we identify informative cues from text and structural markup that surrounds a table. For query-time fact ranking, we devise a new scoring technique that exploits both context similarity and inter-fact consistency. Comparisons of our building blocks against state-of-the-art baselines and extrinsic experiments with two query benchmarks demonstrate the benefits of our method.
引用
收藏
页码:4033 / 4042
页数:10
相关论文
共 50 条
  • [1] Hybrid approach to extracting information from web-tables
    Jung, Sung-won
    Kang, Mi-young
    Kwon, Hyuk-chul
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 109 - +
  • [2] A scalable hybrid approach for extracting head components from Web tables
    Jung, SW
    Kwon, HC
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) : 174 - 187
  • [3] Extracting Knowledge from Web Tables Based on DOM Tree Similarity
    Wu, Xiaolong
    Cao, Cungen
    Wang, Ya
    Fu, Jianhui
    Wang, Shi
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2016, 2016, 9983 : 302 - 313
  • [4] On extracting data from tables that are encoded using HTML']HTML
    Roldan, Juan C.
    Jimenez, Patricia
    Corchuelo, Rafael
    KNOWLEDGE-BASED SYSTEMS, 2020, 190
  • [5] Enhancing Knowledge Bases with Quantity Facts
    Ho, Vinh Thinh
    Stepanova, Daria
    Milchevski, Dragan
    Strotgen, Jannik
    Weikum, Gerhard
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 893 - 901
  • [6] Extracting logical structures from HTML']HTML tables
    Kim, Yeon-Seok
    Lee, Kyong-Ho
    COMPUTER STANDARDS & INTERFACES, 2008, 30 (05) : 296 - 308
  • [7] Putting Web Tables into Context
    Braunschweig, Katrin
    Thiele, Maik
    Koci, Elvis
    Lehner, Wolfgang
    KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1, 2016, : 158 - 165
  • [8] Corroborate and Learn Facts from the Web
    Zhao, Shubin
    Betz, Jonathan
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 995 - 1003
  • [9] Harvesting relational tables from lists on the web
    Elmeleegy, Hazem
    Madhavan, Jayant
    Halevy, Alon
    VLDB JOURNAL, 2011, 20 (02) : 209 - 226
  • [10] Harvesting relational tables from lists on the web
    Hazem Elmeleegy
    Jayant Madhavan
    Alon Halevy
    The VLDB Journal, 2011, 20 : 209 - 226