Extracting Contextualized Quantity Facts from Web Tables

被引:11
作者
Ho, Vinh Thinh [1 ]
Pal, Koninika [1 ]
Razniewski, Simon [1 ]
Berberich, Klaus [1 ,2 ]
Weikum, Gerhard [1 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] Htw Saar, Saarbrucken, Germany
来源
PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) | 2021年
关键词
Information Extraction; Quantity Facts; Web Tables;
D O I
10.1145/3442381.3450072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quantity queries, with filter conditions on quantitative measures of entities, are beyond the functionality of search engines and QA assistants. To enable such queries over web contents, this paper develops a novel method for automatically extracting quantity facts from ad-hoc web tables. This involves recognizing quantities, with normalized values and units, aligning them with the proper entities, and contextualizing these pairs with informative cues to match sophisticated queries with modifiers. Our method includes a new approach to aligning quantity columns to entity columns. Prior works assumed a single subject-column per table, whereas our approach is geared for complex tables and leverages external corpora as evidence. For contextualization, we identify informative cues from text and structural markup that surrounds a table. For query-time fact ranking, we devise a new scoring technique that exploits both context similarity and inter-fact consistency. Comparisons of our building blocks against state-of-the-art baselines and extrinsic experiments with two query benchmarks demonstrate the benefits of our method.
引用
收藏
页码:4033 / 4042
页数:10
相关论文
共 50 条
[31]   Collocating News Articles with Structured Web Tables [J].
Lees, Alyssa ;
Barbosa, Luciano ;
Korn, Flip ;
Silva, Levy de Souza ;
Wu, You ;
Yu, Cong .
WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, :393-401
[32]   A Survey on Knowledge Extraction Techniques for Web Tables [J].
Keshvari-Fini, Parvin ;
Janfada, Behrooz ;
Minaei-Bidgoli, Behrouz .
2019 5TH INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2019, :123-127
[33]   Population of Data in Web-Tables Schema [J].
Shaukat, Kamran ;
Masood, Nayyer ;
Mehreen, Sundas ;
Haider, Fatima ;
Bakar, Abu ;
Shaukat, Usman .
PROCEEDINGS OF THE 2016 19TH INTERNATIONAL MULTI-TOPIC CONFERENCE (INMIC), 2016, :11-16
[34]   DISCOVERING FOREIGN KEYS ON WEB TABLES WITH THE CROWD [J].
Wu, Xiaoyu ;
Wang, Ning ;
Liu, Huaxi .
COMPUTING AND INFORMATICS, 2019, 38 (03) :621-646
[35]   An Efficient Method for Extracting Web News Content [J].
Sun, Jian ;
Tang, Luyang ;
Liao, Dan ;
Chang, Victor .
2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
[36]   A Hybrid Method for Extracting Deep Web Information [J].
Zhang, Yuanpeng ;
Wang, Li ;
Jiang, Kui ;
Qian, Danmin ;
Dong, Jiancheng .
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING, 2015, 124 :777-782
[37]   Extracting Events from Web Documents for Social Media Monitoring Using Structured SVM [J].
Choi, Yoonjae ;
Ryu, Pum-Mo ;
Kim, Hyunki ;
Lee, Changki .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (06) :1410-1414
[38]   Harvest - An Open Source Toolkit for Extracting Posts and Post Metadata from Web Forums [J].
Weichselbraun, Albert ;
Brasoveanu, Adrian M. P. ;
Waldvogel, Roger ;
Odoni, Fabian .
2020 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2020), 2020, :438-444
[39]   Augmenting Tables by Self-supervised Web Search [J].
Loeser, Alexander ;
Nagel, Christoph ;
Pieper, Stephan .
ENABLING REAL-TIME BUSINESS INTELLIGENCE, 2011, 84 :84-99
[40]   Column-specific Context Extraction for Web Tables [J].
Braunschweig, Katrin ;
Thiele, Maik ;
Eberius, Julian ;
Lehner, Wolfgang .
30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, :1072-1077