Efficient computation of deletion-robust k-coverage queries

被引:0
作者
Jiping Zheng
Xingnan Huang
Yuan Ma
机构
[1] Nanjing University of Aeronautics and Astronautics,College of Computer Science and Technology
[2] Nanjing University,Department of Computer Science and Technology
来源
Knowledge and Information Systems | 2021年 / 63卷
关键词
-coverage queries; Representative skyline; Sieving procedure; Robust-Coreset; Chain structure;
D O I
暂无
中图分类号
学科分类号
摘要
Extracting a controllable subset from a large-scale dataset so that users can fully understand the entire dataset is a significant topic for multicriteria decision making. In recent years, this problem has been widely studied, and various query models have been proposed, such as top-k, skyline, k-regret and k-coverage queries. Among these models, the k-coverage query is an ideal query method; this model has stability, scale invariance and high traversal efficiency. However, current methods including k-coverage queries cannot deal with deleting some points from the dataset while providing an effective solution set efficiently. In this paper, we study the robustness of k-coverage queries in two cases involving the dynamic deletion of data points. The first case is when it is assumed that the whole dataset can be obtained in advance, while the second is when the data points arrive in a stream. For a centralized dataset, we introduce a sieving mechanism and use a precalculated threshold to filter a coreset from the entire dataset. Then, the k-coverage query can be carried out on this small coreset instead of the entire dataset, and we propose a threshold-based k-coverage query algorithm, which greatly accelerates query processing. For a streaming dataset, a special chain structure is adopted. Furthermore, a single-pass streaming algorithm named Robust-Sieving is proposed. Moreover, the coreset-based method is extended to answer the problem. In addition, sampling techniques are adopted to accelerate query processing under these two circumstances. Extensive experiments verify the effectiveness of our proposed Robust-Sieving algorithm and the coreset-based algorithms with or without sampling.
引用
收藏
页码:759 / 789
页数:30
相关论文
共 38 条
  • [1] Bai M(2016)Discovering the IEEE Trans Knowl Data Eng (TKDE) 28 2041-2056
  • [2] Xin J(2008) representative skyline over a sliding window ACM Comput Surv (CSUR) 40 1-58
  • [3] Wang G(2008)A survey of top-k query processing techniques in relational database systems J Mach Learn Res (JMLR) 9 235-284
  • [4] Zhang L(2009)Near-optimal sensor placements in gaussian processes: theory, efficient algorithms and empirical studies Inf Syst 34 45-61
  • [5] Zimmermann R(2011)Personalized top-k skyline queries in high-dimensional space IEEE Trans Knowl Data Eng (TKDE) 23 991-1005
  • [6] Yuan Y(2014)Flexible and efficient resolution of skyline query size constraints VLDB J 23 795-815
  • [7] Wu X(1978)Taking the big picture: representative skylines based on significance and diversity Math Program 14 265-294
  • [8] Ilyas IF(2005)An analysis of approximations for maximizing submodular set functions–i TODS 30 41-82
  • [9] Beskales G(2018)Progressive skyline computation in database systems TODS 43 10:1-10:41
  • [10] Soliman MA(2020)K-regret queries using multiplicative utility functions VLDB J 29 147-175