Efficient OLAP query processing in distributed data warehouses

被引:27
作者
Akinde, MO
Böhlen, MH
Johnson, T
Lakshmanan, LVS
Srivastava, D
机构
[1] Aalborg Univ, Dept Comp Sci, DK-9220 Aalborg, Denmark
[2] AT&T Labs Res, Florham Pk, NJ 07932 USA
[3] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1Z4, Canada
关键词
Congestion control (communication) - Data acquisition - Data reduction - Distributed database systems - Internet - Network protocols - Optimization - Query languages;
D O I
10.1016/S0306-4379(02)00051-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The success of Internet applications has led to an explosive growth in the demand for bandwidth from. Internet Service Providers. Managing an Internet protocol network requires collecting and analyzing network data, such as flow-level traffic statistics. Such analyses can typically be expressed as OLAP queries, e.g., correlated aggregate queries and data cubes. Current day OLAP tools for this task assume the availability of the data in a centralized data warehouse. However, the inherently distributed nature of data collection and the huge amount of data extracted at each collection point make it impractical to gather all data at a centralized site. One solution is to maintain a distributed data warehouse, consisting of local data warehouses at-each collection point and a coordinator site, with most of the processing being performed at the local sites. In this paper, we consider the problem of efficient evaluation of OLAP queries over a distributed data warehouse. We have developed the Skalla system for this task. Skalla translates OLAP queries, specified as certain algebraic expressions, into distributed evaluation plans which are shipped to individual sites. A salient property of our approach is that only partial results are shipped - never parts of the detail data. We propose a variety of optimizations to minimize both the synchronization traffic and the local processing done at each site. We finally present an experimental study based on TPC-R data. Our results demonstrate the scalability of our techniques and quantify the performance benefits of the optimization techniques that have gone into the Skalla system. (C) 2002 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:111 / 135
页数:25
相关论文
共 26 条
[1]  
Agarwal S, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P506
[2]  
Akinde M. O., 2001, Databases in Telecommunications II. VLDB 2001 International Workshop. Proceedings (Lecture Notes in Computer Science Vol.2209), P52
[3]  
AKINDE MO, 2002, THESIS AALBORG U DEN
[4]   PARALLEL ALGORITHMS FOR THE EXECUTION OF RELATIONAL DATABASE OPERATIONS [J].
BITTON, D ;
BORAL, H ;
DEWITT, DJ ;
WILKINSON, WK .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 1983, 8 (03) :324-353
[5]  
BORAL H, 1990, IEEE TKDE, V2
[6]   Measurement and analysis of IP network usage and behavior [J].
Cáceres, R ;
Duffield, N ;
Feldmann, A ;
Friedmann, JD ;
Greenberg, A ;
Greer, R ;
Johnson, T ;
Kalmanek, CR ;
Krishnamurthy, B ;
Lavelle, D ;
Mishra, PP ;
Rexford, J ;
Ramakrishnan, KK ;
True, FD ;
van der Merwe, JE .
IEEE COMMUNICATIONS MAGAZINE, 2000, 38 (05) :144-151
[7]  
CERI S, 1984, PRINCIPLES DISTRIBUT
[8]  
Chatziantoniou D, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P295
[9]  
CHATZIANTONIOU D, 2001, P IEEE INT C DAT ENG
[10]  
CHATZIANTONIOU D, 1999, P IEEE INT C DAT ENG