A Pareto Model for OLAP View Size Estimation

被引:0
作者
Thomas P. Nadeau
Toby J. Teorey
机构
[1] The University of Michigan,Computer Science and Engineering Division (CSE), Department of Electrical Engineering and Computer Science (EECS)
来源
Information Systems Frontiers | 2003年 / 5卷
关键词
Pareto distribution; OLAP; view size estimation; materialized view selection;
D O I
暂无
中图分类号
学科分类号
摘要
On-Line Analytical Processing (OLAP) aims at gaining useful information quickly from large amounts of data residing in a data warehouse. To improve the quickness of response to queries, pre-aggregation is a useful strategy. However, it is usually impossible to pre-aggregate along all combinations of the dimensions. The multi-dimensional aspects of the data lead to combinatorial explosion in the number and potential storage size of the aggregates. We must selectively pre-aggregate. Cost/benefit analysis involves estimating the storage requirements of the aggregates in question. We present an original algorithm for estimating the number of rows in an aggregate based on the Pareto distribution model. We test the Pareto Model Algorithm empirically against four published algorithms, and conclude the Pareto Model Algorithm is consistently the best of these algorithms for estimating view size.
引用
收藏
页码:137 / 147
页数:10
相关论文
共 3 条
  • [1] Cardenas A.(1975)Analysis and performance of inverted database structures Communications of the ACM 18 253-263
  • [2] Flajolet P(1985)Probabilistic counting algorithms for database applications Journal of Computer and System Sciences 31 182-209
  • [3] Martin G.(undefined)undefined undefined undefined undefined-undefined