Approximate Discovery of Functional Dependencies for Large Datasets

被引:13
作者
Bleifuss, Tobias [1 ]
Buelow, Susanne [1 ]
Frohnhofen, Johannes [1 ]
Risch, Julian [1 ]
Wiese, Georg [1 ]
Kruse, Sebastian [1 ]
Papenbrock, Thorsten [1 ]
Naumann, Felix [1 ]
机构
[1] Hasso Plattner Inst, Prof Dr Helmert Str 2-3, D-14482 Potsdam, Germany
来源
CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT | 2016年
关键词
DATABASE;
D O I
10.1145/2983323.2983781
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Functional dependencies (FDs) are an important prerequisite for various data management tasks, such as schema normalization, query optimization, and data cleansing. However, automatic FD discovery entails an exponentially growing search and solution space, so that even today's fastest FD discovery algorithms are limited to small datasets only, due to long runtimes and high memory consumptions. To overcome this situation, we propose an approximate discovery strategy that sacrifices possibly little result correctness in return for large performance improvements. In particular, we introduce AID-FD, an algorithm that approximately discovers FDs within runtimes up to orders of magnitude faster than state-of-the-art FD discovery algorithms. We evaluate and compare our performance results with a focus on scalability in runtime and memory, and with measures for completeness, correctness, and minimality.
引用
收藏
页码:1803 / 1812
页数:10
相关论文
共 15 条
[1]  
Abedjan Z., 2014, P 23 ACM INT C INF K, P949, DOI 10.1145/2661829.2661884
[2]   Profiling relational data: a survey [J].
Abedjan, Ziawasch ;
Golab, Lukasz ;
Naumann, Felix .
VLDB JOURNAL, 2015, 24 (04) :557-581
[3]  
Aboulnaga Ashraf, 2004, P ACM SIGMOD INT C M, P647, DOI 10.1145/1007568.1007641
[4]  
Armstrong W.W., 1974, IFIP C, P580
[5]  
Brown Paul G, 2003, Proceedings of the 29th international conference on Very large data bases-Volume 29, VLDB '2003, P668
[6]   LOGIC-BASED APPROACH TO SEMANTIC QUERY OPTIMIZATION [J].
CHAKRAVARTHY, US ;
GRANT, J ;
MINKER, J .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 1990, 15 (02) :162-207
[7]  
Flach PA, 1999, AI COMMUN, V12, P139
[8]   TANE:: An efficient algorithm for discovering functional and approximate dependencies [J].
Huhtala, Y ;
Kärkkäinen, J ;
Porkka, P ;
Toivonen, H .
COMPUTER JOURNAL, 1999, 42 (02) :100-111
[9]   APPROXIMATE INFERENCE OF FUNCTIONAL-DEPENDENCIES FROM RELATIONS [J].
KIVINEN, J ;
MANNILA, H .
THEORETICAL COMPUTER SCIENCE, 1995, 149 (01) :129-149
[10]  
Liu J., 2010, IEEE Trans. Knowl. Data Eng., V24, P251