Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods

被引:0
作者
Bin Jiang
Jian Pei
Xuemin Lin
Yidong Yuan
机构
[1] Simon Fraser University,School of Computing Science
[2] The University of New South Wales and NICTA,School of Computer Science and Engineering
来源
Journal of Intelligent Information Systems | 2012年 / 38卷
关键词
Uncertain data; Skyline queries; Probabilistic queries; Algorithms;
D O I
暂无
中图分类号
学科分类号
摘要
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all objects whose skyline probabilities are at least p (0 < p ≤ 1). Computing probabilistic skylines on large uncertain data sets is challenging. We develop a bounding-pruning-refining framework and three algorithms systematically. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Combining the advantages of the bottom-up algorithm and the top-down algorithm, we develop a hybrid algorithm to further improve the performance. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our algorithms are efficient on large data sets.
引用
收藏
页码:1 / 39
页数:38
相关论文
共 12 条
  • [1] Bentley JL(1975)Multidimensional binary search trees used for associative searching Communications of the ACM (CACM) 18 509-517
  • [2] Bentley JL(1978)On the average number of maxima in a set of vectors and applications Journal of the ACM 25 536-543
  • [3] Kung HT(1984)Incomplete information in relational databases Journal of the ACM 31 761-791
  • [4] Schkolnick M(1975)On finding the maxima of a set of vectors Journal of the ACM 22 469-476
  • [5] Thompson CD(2006)Maintaining sliding window skylines on data streams IEEE Transactions on Knowledge and Data Engineering 18 377-391
  • [6] Imielinski T(undefined)undefined undefined undefined undefined-undefined
  • [7] Witold Lipski J(undefined)undefined undefined undefined undefined-undefined
  • [8] Kung HT(undefined)undefined undefined undefined undefined-undefined
  • [9] Luccio F(undefined)undefined undefined undefined undefined-undefined
  • [10] Preparata FP(undefined)undefined undefined undefined undefined-undefined