Extracting Contextualized Quantity Facts from Web Tables

被引:11
作者
Ho, Vinh Thinh [1 ]
Pal, Koninika [1 ]
Razniewski, Simon [1 ]
Berberich, Klaus [1 ,2 ]
Weikum, Gerhard [1 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] Htw Saar, Saarbrucken, Germany
来源
PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) | 2021年
关键词
Information Extraction; Quantity Facts; Web Tables;
D O I
10.1145/3442381.3450072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quantity queries, with filter conditions on quantitative measures of entities, are beyond the functionality of search engines and QA assistants. To enable such queries over web contents, this paper develops a novel method for automatically extracting quantity facts from ad-hoc web tables. This involves recognizing quantities, with normalized values and units, aligning them with the proper entities, and contextualizing these pairs with informative cues to match sophisticated queries with modifiers. Our method includes a new approach to aligning quantity columns to entity columns. Prior works assumed a single subject-column per table, whereas our approach is geared for complex tables and leverages external corpora as evidence. For contextualization, we identify informative cues from text and structural markup that surrounds a table. For query-time fact ranking, we devise a new scoring technique that exploits both context similarity and inter-fact consistency. Comparisons of our building blocks against state-of-the-art baselines and extrinsic experiments with two query benchmarks demonstrate the benefits of our method.
引用
收藏
页码:4033 / 4042
页数:10
相关论文
共 50 条
[21]   Generating Titles for Web Tables [J].
Hancock, Braden ;
Lee, Hongrae ;
Yu, Cong .
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, :638-647
[22]   On extracting link information of relationship instances from a web site [J].
Naing, MM ;
Lim, EP ;
Goh, DHL .
WEB SERVICES -ICWS-EUROPE 2003, PROCEEDINGS, 2003, 2853 :213-226
[23]   Research of Extracting Data from HTML Web Pages Automatically [J].
王茹 ;
宋瀚涛 ;
陆玉昌 .
Journal of Beijing Institute of Technology, 2003, (S1) :104-108
[24]   Extracting Attribute-Value Pairs from Product Specifications on the Web [J].
Petrovski, Petar ;
Bizer, Christian .
2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), 2017, :558-565
[25]   TabEL: Entity Linking in Web Tables [J].
Bhagavatula, Chandra Sekhar ;
Noraset, Thanapon ;
Downey, Doug .
SEMANTIC WEB - ISWC 2015, PT I, 2015, 9366 :425-441
[26]   Automatic construction of RDF with web tables [J].
Yan, Li ;
Sheng, Jie ;
Tu, Yaofeng ;
Zhou, Xiangsheng ;
Ma, Zongmin .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 182
[27]   Beyond supervised learning of wrappers for extracting information from unseen Web sites [J].
Wong, TL ;
Lam, W ;
Wang, W .
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 :725-733
[28]   Self-Adaptive Extracting Academic Entities from World Wide Web [J].
Yuan, Pingpeng ;
Li, Yi ;
Jin, Hai ;
Liu, Ling .
2015 IEEE CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (CIC), 2015, :270-277
[29]   The Technology of Extracting Content Information from Web Page Based on DOM Tree [J].
Yuan, Dingrong ;
Mo, Zhuoying ;
Xie, Bing ;
Xie, Yangcai .
ADVANCED RESEARCH ON ELECTRONIC COMMERCE, WEB APPLICATION, AND COMMUNICATION, PT 2, 2011, 144 :271-278
[30]   A novel method for extracting information from web pages with multiple presentation templates [J].
Qingzhong L. ;
Yanhui D. ;
An F. ;
Yongquan D. .
Journal of Software, 2010, 5 (05) :506-513