Capturing Missing Tuples and Missing Values

被引:9
作者
Deng, Ting [1 ,2 ,5 ]
Fan, Wenfei [1 ,2 ,3 ,6 ]
Geerts, Floris [4 ]
机构
[1] Beihang Univ, RCBD, Beijing 100191, Peoples R China
[2] Beihang Univ, SKLSDE, Beijing 100191, Peoples R China
[3] Univ Edinburgh, Informat, Edinburgh EH8 9YL, Midlothian, Scotland
[4] Univ Antwerp, Dept Math & Comp Sci, Antwerp, Belgium
[5] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
[6] Univ Edinburgh, Sch Informat, Edinburgh EH8 9YL, Midlothian, Scotland
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2016年 / 41卷 / 02期
基金
英国工程与自然科学研究理事会;
关键词
Design; Algorithms; Theory; Incomplete information; relative completeness; master data management; partially closed databases; complexity; INCOMPLETE INFORMATION; COMPLEXITY;
D O I
10.1145/2901737
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Databases in real life are often neither entirely closed-world nor entirely open-world. Databases in an enterprise are typically partially closed, in which a part of the data is constrained by master data that contains complete information about the enterprise in certain aspects. It has been shown that, despite missing tuples, such a database may turn out to have complete information for answering a query. This article studies partially closed databases from which both tuples and attribute values may be missing. We specify such a database in terms of conditional tables constrained by master data, referred to as c-instances. We first propose three models to characterize whether a c-instance T is complete for a query Q relative to master data. That is, depending on how missing values in T are instantiated, the answer to Q in T remains unchanged when new tuples are added. We then investigate three problems, to determine (a) whether a given c-instance is complete for a query Q, (b) whether there exists a c-instance that is complete for Q relative to master data available, and (c) whether a c-instance is a minimal-size database that is complete for Q. We establish matching lower and upper bounds on these problems for queries expressed in a variety of languages in each of the three models for specifying relative completeness.
引用
收藏
页数:47
相关论文
共 36 条
[1]   ON THE REPRESENTATION AND QUERYING OF SETS OF POSSIBLE WORLDS [J].
ABITEBOUL, S ;
KANELLAKIS, P ;
GRAHNE, G .
THEORETICAL COMPUTER SCIENCE, 1991, 78 (01) :159-187
[2]  
Abiteboul S., 1998, Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 1998, P254, DOI 10.1145/275487.275516
[3]  
Abiteboul S, 1995, FDN DATABASES
[4]  
[Anonymous], THESIS
[5]  
[Anonymous], 1991, Lecture Notes in Computer Science
[6]  
[Anonymous], 2012, SYNTH LECT DATA MANA
[7]  
[Anonymous], 2002, 21 ACM SIGACT SIGMOD, DOI DOI 10.1145/543613.543644
[8]  
Arenas M., 1999, Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, P68, DOI 10.1145/303976.303983
[9]  
Arenas M, 2009, SIGMOD REC, V38, P17, DOI 10.1145/1815933.1815938
[10]   On the data complexity of relative information completeness [J].
Cao, Yang ;
Deng, Ting ;
Fan, Wenfei ;
Geerts, Floris .
INFORMATION SYSTEMS, 2014, 45 :18-34