Aggregate Query Processing Algorithm on Incomplete Data Based on Denotational Semantics

被引:0
作者
Zhang A.-Z. [1 ,2 ]
Li J.-Z. [1 ]
Gao H. [1 ]
机构
[1] School of Computer Science and Technology, Harbin Institute of Technology, Harbin
[2] School of Computer Science, Shenyang Aerospace University, Shenyang
来源
Ruan Jian Xue Bao/Journal of Software | 2020年 / 31卷 / 02期
基金
中国国家自然科学基金;
关键词
Approximate query processing; Data reparation; Data usability; Incomplete data; Result estimation;
D O I
10.13328/j.cnki.jos.005876
中图分类号
学科分类号
摘要
This work studies the problem of aggregate query processing over incomplete data based on denotational semantics. Incomplete data is also known as missing values and can be classified into two categories: applicable nulls and inapplicable nulls. Existing imputation algorithms cannot guarantee the accuracy of the query result after imputation. The interval estimation of the aggregate query result is given. This study extends the relational model under the denotational semantic, which can cover all types of incomplete data. A new semantic of aggregate query answers over incomplete data is defined. Reliable answers are interval estimations of the ground-truth query results, which can cover the ground-truth results with high probability. For SUM, COUNT, and AVG queries, linear approximate evaluation algorithms are proposed to compute reliable answers. The extended experiments on the real datasets and synthetic datasets verify the effectiveness of the method proposed in this study. © Copyright 2020, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:406 / 420
页数:14
相关论文
共 39 条
  • [21] Cios K.J., Kurgan L.A., CLIP4: Hybrid inductive machine learning algorithm that generates inequality rules, Information Science, 163, 1-3, pp. 37-83, (2004)
  • [22] Farhangfar A., Kurgan L.A., Pedrycz W., A novel framework for imputation of missing values in databases, IEEE Trans. on Systems Man and Cybernetics, 37, 5, pp. 692-709, (2007)
  • [23] Abiteboul S., Hull R., Vianu V., Foundations of Databases, (1995)
  • [24] Van Meyden R., Logical approaches to incomplete information: A survey, Proc. of the Logics for Databases and Information Systems, pp. 307-356, (1998)
  • [25] Grahne G., The Problem of Incomplete Information in Relational Databases, (1991)
  • [26] Imielinski T., Lipski L., Incomplete information in relational databases, Journal of the ACM, 31, 4, pp. 761-791, (1984)
  • [27] Codd E.F., Extending the database relational model to capture more meaning, ACM Trans. on Database System, 4, 4, pp. 397-434, (1979)
  • [28] Date C.J., Database in Depth Relational Theory for Practitioners, (2005)
  • [29] Date C.J., A critique of Claude Rubinson's paper nulls, three-Valued logic, and ambiguity in SQL: Critiquing date's critique, ACM SIGMOD Record, 37, 3, pp. 2-22, (2008)
  • [30] Date C.J., Darwen H., A Guide to SQL Standard, (1997)