Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals

被引:666
作者
Gray, J
Chaudhuri, S
Bosworth, A
Layman, A
Reichart, D
Venkatrao, M
Pellow, F
Pirahesh, H
机构
[1] Microsoft Corp, Adv Technol Div, Microsoft Res, Redmond, WA 98052 USA
[2] IBM Res Corp, San Jose, CA 95120 USA
关键词
data cube; data mining; aggregation; summarization; database; analysis; query;
D O I
10.1023/A:1009726021843
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The novelty is that cubes are relations. Consequently, the cube operator can be imbedded in more complex non-procedural data analysis programs. The cube operator treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. This paper (1) explains the cube and roll-up operators, (2) shows how they fit in SQL, (3) explains how users can define new aggregate functions for cubes, and (4) discusses efficient techniques to compute the cube. Many of these features are being added to the SQL Standard.
引用
收藏
页码:29 / 53
页数:25
相关论文
共 19 条
  • [1] AGRAWAL R, 1996, P 21 VLDB BOMB
  • [2] [Anonymous], 1995, INTRO DATABASE SYSTE
  • [3] DATE CJ, 1996, DATABASE PROGRAMMING, V9, P17
  • [4] EARLE RJ, 1994, Patent No. [5359724, 05359724]
  • [5] QUERY EVALUATION TECHNIQUES FOR LARGE DATABASES
    GRAEFE, G
    [J]. COMPUTING SURVEYS, 1993, 25 (02) : 73 - 170
  • [6] GRAY J, 1996, P INT C DAT ENG NEW
  • [7] Gray J., 1993, BENCHMARK HDB DATABA
  • [8] GRAY J, 1991, BENCHMARK HDB
  • [9] HARINARAYAN V, 1996, P 1996 ACM SIGMOD IN, P205
  • [10] *INF SOFTW, 1996, DAT DEV KIT US GUID