Online random shuffling of large database tables

被引:6
作者
Jermaine, Christopher [1 ]
机构
[1] Univ Florida, CISE Dept, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
sampling methods; database systems;
D O I
10.1109/TKDE.2007.250586
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many applications require a randomized ordering of input data. Examples include algorithms for online aggregation, data mining, and various randomized algorithms. Most existing work seems to assume that accessing the records from a large database in a randomized order is not a difficult problem. However, it turns out to be extremely difficult in practice. Using existing methods, randomization is either extremely expensive at the front end ( as data are loaded), or at the back end ( as data are queried). This paper presents a simple file structure which supports both efficient, online random shuffling of a large database, as well as efficient online sampling or randomization of the database when it is queried. The key innovation of our method is the introduction of a small degree of carefully controlled, rigorously monitored nonrandomness into the file.
引用
收藏
页码:73 / 84
页数:12
相关论文
共 30 条
  • [1] [Anonymous], 1999, P 5 ACM SIGKDD INT C
  • [2] [Anonymous], P 1997 ACM SIGMOD IN
  • [3] [Anonymous], 2001, P 18 INT C MACH LEAR
  • [4] ANTOSHENKOV G, 1992, PROC INT CONF VERY L, P375
  • [5] BARRON AR, 1988, P IEEE S INFORM THEO, V38, P1437
  • [6] Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9
  • [7] Chaudhuri Surajit, 2004, P 2004 ACM SIGMOD IN, P287, DOI [10.1145/1007568.1007602, DOI 10.1145/1007568.1007602]
  • [8] Maintenance of discovered association rules in large databases: Art incremental updating technique
    Cheung, DW
    Han, JW
    Ng, VT
    Wong, CY
    [J]. PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, : 106 - 114
  • [9] Cochran W. G., 1977, WILEY SERIES PROBABI
  • [10] Garcia-Molina Hector., 2000, DATABASE SYSTEM IMPL