Online random shuffling of large database tables

被引：6

作者：

Jermaine, Christopher ^{[1
]}

机构：

[1] Univ Florida, CISE Dept, Gainesville, FL 32611 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2007年 / 19卷 / 01期

基金：

美国国家科学基金会;

关键词：

sampling methods; database systems;

D O I：

10.1109/TKDE.2007.250586

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many applications require a randomized ordering of input data. Examples include algorithms for online aggregation, data mining, and various randomized algorithms. Most existing work seems to assume that accessing the records from a large database in a randomized order is not a difficult problem. However, it turns out to be extremely difficult in practice. Using existing methods, randomization is either extremely expensive at the front end ( as data are loaded), or at the back end ( as data are queried). This paper presents a simple file structure which supports both efficient, online random shuffling of a large database, as well as efficient online sampling or randomization of the database when it is queried. The key innovation of our method is the introduction of a small degree of carefully controlled, rigorously monitored nonrandomness into the file.

引用

页码：73 / 84

页数：12

共 30 条

[1]

[Anonymous], 1999, P 5 ACM SIGKDD INT C

[2]

[Anonymous], P 1997 ACM SIGMOD IN

[3]

[Anonymous], 2001, P 18 INT C MACH LEAR

[4]

ANTOSHENKOV G, 1992, PROC INT CONF VERY L, P375

[5]

BARRON AR, 1988, P IEEE S INFORM THEO, V38, P1437

[6]

Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9

[7]

Chaudhuri Surajit, 2004, P 2004 ACM SIGMOD IN, P287, DOI [10.1145/1007568.1007602, DOI 10.1145/1007568.1007602]

[8] Maintenance of discovered association rules in large databases: Art incremental updating technique [J].

Cheung, DW ;

Han, JW ;

Ng, VT ;

Wong, CY .

PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, :106-114

[9]

Cochran W. G., 1977, WILEY SERIES PROBABI

[10]

Garcia-Molina Hector., 2000, DATABASE SYSTEM IMPL

← 1 2 3 →