Self-tuning cost modeling of user-defined functions in an object-relational DBMS

被引：10

作者：

He, Z ^{[1
]}

Lee, BS

Snapp, R

机构：

[1] La Trobe Univ, Dept Comp Sci, Bundoora, Vic 3086, Australia

[2] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA

来源：

ACM TRANSACTIONS ON DATABASE SYSTEMS | 2005年 / 30卷 / 03期

关键词：

algorithms; performance; K-nearest neighbors; quadtree; cost modeling; object relational DBMS; query optimization; self-tuning;

D O I：

10.1145/1093382.1093387

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Query optimizers in object-relational database management systems typically require users to provide the execution cost models of user-defined functions (UDFs). Despite this need, however, there has been little work done to provide such a model. The existing approaches are static in that they require users to train the model a priori with pregenerated UDF execution cost data. Static approaches can not adapt to changing UDF execution patterns and thus degrade in accuracy when the UDF executions used for generating training data do not reflect the patterns of those performed during operation. This article proposes a new approach based on the recent trend of self-tuning DBMS by which the cost model is maintained dynamically and incrementally as UDFs are being executed online. In the context of UDF cost modeling, our approach faces a number of challenges, that is, it should work with limited memory, work with limited computation time, and adjust to the fluctuations in the execution costs (e.g., caching effect). In this article, we first provide a set of guidelines for developing techniques that meet these challenges, while achieving accurate and fast cost prediction with small overheads. Then, we present two concrete techniques developed under the guidelines. One is an instance-based technique based on the conventional k-nearest neighbor (KNN) technique which uses a multidimensional index like the R*-tree. The other is a summary-based technique which uses the quadtree to store summary values at multiple resolutions. We have performed extensive performance evaluations comparing these two techniques against existing histogram-based techniques and the KNN technique, using both real and synthetic UDFs/data sets. The results show our techniques provide better performance in most situations considered.

引用

页码：812 / 853

页数：42

共 46 条

[1]

Aboulnaga A, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P181, DOI 10.1145/304181.304198

[2]

[Anonymous], 1995, Kernel Smoothing Monographs on Statistics and Applied Probability

[3]

BEKMANN N, 1990, P ACM SIGMOD INT C M, P322

[4]

Bogartz R.S., 1994, An Introduction to the Analysis of Variance

[5]

Boulos J., 1997, Transactions of the Information Processing Society of Japan, V38, P2566

[6]

Boulos J., 1999, SIGMOD Record, V28, P22, DOI 10.1145/333607.333610

[7]

Bruno N., 2001, SIGMOD RECORD, P211

[8] A quad-tree based multiresolution approach for two-dimensional summary data [J].

Buccafurri, F ;

Saccà, D ;

Furfaro, F ;

Sirangelo, C .

SSDBM 2002: 15TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2003, :127-137

[9] FINDING PROTOTYPES FOR NEAREST NEIGHBOR CLASSIFIERS [J].

CHANG, CL .

IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (11) :1179-1184

[10] Optimization of queries with user-defined predicates [J].

Chaudhuri, S ;

Shim, K .

ACM TRANSACTIONS ON DATABASE SYSTEMS, 1999, 24 (02) :177-228

← 1 2 3 4 5 →