DISCOVERING FOREIGN KEYS ON WEB TABLES WITH THE CROWD

被引:2
作者
Wu, Xiaoyu [1 ]
Wang, Ning [1 ]
Liu, Huaxi [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Foreign key; web tables; crowdsourcing; task selection; task reduction; semantic recovery;
D O I
10.31577/cai_2019_3_621
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Foreign-key relationship is one of the most important constraints between two tables. Previous works focused on detecting inclusion dependencies (INDs) or foreign keys in relational database. To discover foreign-key relationship is obviously helpful for analyzing and integrating data in web tables. However, because of poor quality of web tables, it is difficult to discover foreign keys by existing techniques based on checking basic integrity constraints. In this paper, we propose a hybrid human-machine framework to detect foreign keys on web tables. After discovering candidates and evaluating their confidence of being true foreign keys by machine algorithm, we verify those candidates leveraging the power of the crowd. To reduce the monetary cost, a dynamical task selection technique based on conflict detection and inclusion dependency is proposed, which could eliminate redundant tasks and assign the most valuable tasks to workers. Additionally, to make workers complete tasks more effectively and efficiently, sampling strategy is applied to mini-mize the number of tuples posed to the crowd. We conducted extensive experiments on real-world datasets and results show that our framework can obviously improve foreign key detection accuracy on web tables with lower monetary cost and time cost.
引用
收藏
页码:621 / 646
页数:26
相关论文
共 16 条
[1]  
Bauckmann J, 2007, PROC INT CONF DATA, P1423
[2]   WebTables: Exploring the Power of Tables on the Web [J].
Cafarella, Michael J. ;
Halevy, Alon ;
Wang, Daisy Zhe ;
Wu, Eugene ;
Zhang, Yang .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01) :538-549
[3]   Fast Foreign-Key Detection in Microsoft SQL Server PowerPivot for Excel [J].
Chen, Zhimin ;
Narasayya, Vivek ;
Chaudhuri, Surajit .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (13) :1417-1428
[4]   Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases [J].
Dong Deng ;
Yu Jiang ;
Li, Guoliang ;
Jian Li ;
Cong Yu .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (13) :1606-1617
[5]  
Gonzalez Hector., 2010, Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, P1061, DOI DOI 10.1145/1807167.1807286
[6]  
Jingjing Wang, 2012, Conceptual Modeling. Proceedings 31st International Conference, ER 2012, P141, DOI 10.1007/978-3-642-34002-4_11
[7]   Crowdsourced Data Management: A Survey [J].
Li, Guoliang ;
Wang, Jiannan ;
Zheng, Yudian ;
Franklin, Michael J. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (09) :2296-2319
[8]   CrowdSR: A Crowd Enabled System for Semantic Recovering of Web Tables [J].
Liu, Huaxi ;
Wang, Ning ;
Ren, Xiangran .
WEB-AGE INFORMATION MANAGEMENT (WAIM 2015), 2015, 9098 :581-583
[9]  
Rostin A., 2009, 12 INT WORKSH WEB DA
[10]   Recovering Semantics of Tables on the Web [J].
Venetis, Petros ;
Halevy, Alon ;
Madhavan, Jayant ;
Pasca, Marius ;
Shen, Warren ;
Wu, Fei ;
Miao, Gengxin ;
Wu, Chung .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (09) :528-538