Aggregate Query Processing on Incomplete Data

被引:2
作者
Zhang, Anzhen [1 ]
Wang, Jinbao [1 ]
Li, Jianzhong [1 ]
Gao, Hong [1 ]
机构
[1] Harbin Inst Technol, Dept Comp Sci & Technol, Harbin, Heilongjiang, Peoples R China
来源
WEB AND BIG DATA (APWEB-WAIM 2018), PT I | 2018年 / 10987卷
关键词
Aggregate query; Incomplete data; Estimation;
D O I
10.1007/978-3-319-96890-2_24
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Incomplete data has been a longstanding issue in database community, and yet the subject is poorly handled by both theory and practice. In this paper, we propose to directly estimate the aggregate query result on incomplete data, rather than imputing the missing values. An interval estimation, composed of the upper and lower bound of aggregate query results among all possible interpretation of missing values, are presented to the end-users. The ground-truth aggregate result is guaranteed to be among the interval. Experimental results are consistent with the theoretical results, and suggest that the estimation is invaluable to better assess the results of aggregate queries on incomplete data.
引用
收藏
页码:286 / 294
页数:9
相关论文
共 12 条
[1]  
[Anonymous], P SIGMOD
[2]  
Codd E. F., 1979, ACM Transactions on Database Systems, V4, P397, DOI 10.1145/320107.320109
[3]   Capturing Missing Tuples and Missing Values [J].
Deng, Ting ;
Fan, Wenfei ;
Geerts, Floris .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2016, 41 (02)
[4]   NADEEF: A Generalized Data Cleaning System [J].
Ebaid, Amr ;
Elmagarmid, Ahmed ;
Ilyas, Ihab F. ;
Ouzzani, Mourad ;
Quiane-Ruiz, Jorge-Arnulfo ;
Tang, Nan ;
Yin, Si .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (12) :1218-1221
[5]  
Fahandar Mohsen Ahmadi, 2017, PMLR, P1078
[6]   Correctness of SQL Queries on Databases with Nulls [J].
Guagliardo, Paolo ;
Libkin, Leonid .
SIGMOD RECORD, 2017, 46 (03) :5-16
[7]  
Lipski W. Jr., 1979, ACM Transactions on Database Systems, V4, P262, DOI 10.1145/320083.320088
[8]   Bayesian estimation of incomplete data using conditionally specified priors [J].
Maria Sarabia, Jose ;
Shahtahmassebi, Golnaz .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (05) :3419-3435
[9]  
Osborne J.W., 2012, Best practices in data cleaning: A complete guide to everything you need to do before and after collecting your data
[10]  
Rahm E., 2000, IEEE DATA ENG B, V23, P3, DOI DOI 10.1145/1317331.1317341