The partitioned exponential file for database storage management

被引：27

作者：

Jermaine, Christopher ^{[1
]}

Omiecinski, Edward

Yee, Wai Gen

机构：

[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA

[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA

[3] IIT, Dept Comp Sci, Chicago, IL 60616 USA

来源：

VLDB JOURNAL | 2007年 / 16卷 / 04期

关键词：

storage management; indexing; data warehousing;

D O I：

10.1007/s00778-005-0171-7

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The rate of increase in hard disk storage capacity continues to outpace the rate of decrease in hard disk seek time. This trend implies that the value of a seek is increasing exponentially relative to the value of storage. With this trend in mind, we introduce the partitioned exponential file ( PE file) which is a generic storage manager that can be customized for many different types of data ( e. g., numerical, spatial, or temporal). The PE file is intended for use in environments with intense update loads and concurrent, analytic queries. Such an environment may be found, for example, in long-running scientific applications which can produce petabytes of data. For example, the proposed Large Synoptic Survey Telescope [ 36] will produce 50-100 petabytes of observational, scientific data over its multi-year lifetime. This database will never be taken off-line, so bursty update loads of tens of terabytes per day must be handled concurrently with data analysis. In the PE file, data are organized as a series of on-disk sorts with a careful, global organization. Because the PE file relies heavily on sequential I/O, only a fraction of a disk seek is required for a typical record insertion or retrieval. In addition to describing the PE file, we also detail a set of benchmarking experiments for T1SM, which is a PE file customized for use with multi-attribute data records ordered on a single numerical attribute. In our benchmarking, we implement and test many competing data organizations that can be used to index and store such data, such as the B+- Tree, the LSM-Tree, the Buffer Tree, the Stepped Merge Method, and the Y-Tree. As expected, no organization is the best over all benchmarks, but our experiments show that T1SM is the best choice in many situations, suggesting that it is the best overall. Specifically, T1SM performs exceptionally well in the case of a heavy query workload that must be handled concurrently with an intense insertion stream. Our experiments show that T1SM ( and its close cousin, the T2SM storage manager for spatial data) can handle very heavy mixed workloads of this type, and still maintain acceptably small query latencies.

引用

页码：417 / 437

页数：21

共 37 条

[1]

Agarwal PK, 2001, LECT NOTES COMPUT SC, V2076, P115

[2] THE INPUT OUTPUT COMPLEXITY OF SORTING AND RELATED PROBLEMS [J].

AGGARWAL, A ;

VITTER, JS .

COMMUNICATIONS OF THE ACM, 1988, 31 (09) :1116-1127

[3]

Anderson, 1997, P 16 ACM S OP SYST P

[4]

[Anonymous], 1995, P 21 INT C VER LARG

[5]

Arge L, 1995, LECT NOTES COMPUT SC, V955, P334

[6]

ARGE L, 1999, INT WORKSH ALE NEX, P328

[7] DECOMPOSABLE SEARCHING PROBLEMS [J].

BENTLEY, JL .

INFORMATION PROCESSING LETTERS, 1979, 8 (05) :244-251

[8] RAID - HIGH-PERFORMANCE, RELIABLE SECONDARY STORAGE [J].

CHEN, PM ;

LEE, EK ;

GIBSON, GA ;

KATZ, RH ;

PATTERSON, DA .

ACM COMPUTING SURVEYS, 1994, 26 (02) :145-185

[9] UBIQUITOUS B-TREE [J].

COMER, D .

COMPUTING SURVEYS, 1979, 11 (02) :121-137

[10]

CORMEN TH, 1992, INTRO ALGORITHMS

← 1 2 3 4 →