Aggregate Query Processing Algorithm on Incomplete Data Based on Denotational Semantics

被引:0
作者
Zhang A.-Z. [1 ,2 ]
Li J.-Z. [1 ]
Gao H. [1 ]
机构
[1] School of Computer Science and Technology, Harbin Institute of Technology, Harbin
[2] School of Computer Science, Shenyang Aerospace University, Shenyang
来源
Ruan Jian Xue Bao/Journal of Software | 2020年 / 31卷 / 02期
基金
中国国家自然科学基金;
关键词
Approximate query processing; Data reparation; Data usability; Incomplete data; Result estimation;
D O I
10.13328/j.cnki.jos.005876
中图分类号
学科分类号
摘要
This work studies the problem of aggregate query processing over incomplete data based on denotational semantics. Incomplete data is also known as missing values and can be classified into two categories: applicable nulls and inapplicable nulls. Existing imputation algorithms cannot guarantee the accuracy of the query result after imputation. The interval estimation of the aggregate query result is given. This study extends the relational model under the denotational semantic, which can cover all types of incomplete data. A new semantic of aggregate query answers over incomplete data is defined. Reliable answers are interval estimations of the ground-truth query results, which can cover the ground-truth results with high probability. For SUM, COUNT, and AVG queries, linear approximate evaluation algorithms are proposed to compute reliable answers. The extended experiments on the real datasets and synthetic datasets verify the effectiveness of the method proposed in this study. © Copyright 2020, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:406 / 420
页数:14
相关论文
共 39 条
  • [1] Gong X.Q., Jin C.Q., Wang X.L., Zhang R., Zhou A.Y., Data-intensive science and engineering: requirements and challenges, Chinese Journal of Computers, 35, 8, pp. 1-16, (2012)
  • [2] Li J.Z., Liu X.M., An important aspect of big data: Data usability, Journal of Computer Research and Development, 50, 6, pp. 1147-1162, (2013)
  • [3] Tian J., Yu B., Yu D., Et al., Missing data analyses: A hybrid multiple imputation algorithm using gray system theory and entropy based on clustering, Application Intelligence, 40, 2, pp. 376-388, (2014)
  • [4] Zhang S., Shell-neighbor method and its application in missing data imputation, Application Intelligence, 35, 1, pp. 123-133, (2011)
  • [5] Zhang S., Jin Z., Zhu X., Missing data imputation by utilizing information within incomplete instances, Journal of Systems and Software, 84, 3, pp. 452-459, (2011)
  • [6] Zhu X., Zhang S., Jin Z., Et al., Missing value estimation for mixed-attribute data sets, IEEE Trans. on Knowledge Data Engineering, 23, 1, pp. 110-121, (2011)
  • [7] Song S., Zhang A., Chen L., Et al., Enriching data imputation with extensive similarity neighbors, Proc. of the VLDB Endowment, 8, 11, pp. 1286-1297, (2015)
  • [8] Wu S., Feng X., Han Y., Et al., Missing categorical data imputation approach based on similarity, Proc. of the IEEE Int'l Conf. on Systems, pp. 2827-2832, (2012)
  • [9] Li Z., Qin L., Cheng H., Et al., TRIP: An interactive retrieving-inferring data imputation approach, Proc. of the IEEE Int'l Conf. on Data Engineering, pp. 1462-1463, (2016)
  • [10] Ye C., Wang H., Li J., Et al., Crowdsourcing-enhanced missing values imputation based on bayesian network, Proc. of the Database Systems for Advanced Applications, pp. 67-81, (2016)