Distribution-based synthetic database generation techniques for itemset mining

被引:10
作者
Ramesh, G [1 ]
Zaki, MJ [1 ]
Maniatty, WA [1 ]
机构
[1] Univ British Columbia, Vancouver, BC V5Z 1M9, Canada
来源
9th International Database Engineering & Application Symposium, Proceedings | 2005年
关键词
D O I
10.1109/IDEAS.2005.22
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The resource requirements of frequent pattern mining algorithms depend mainly on the length distribution of the aimed patterns in the database. Synthetic databases, which are used to benchmark performance of algorithms, tend to have distributions far different from those observed in real datasets. In this paper we focus on the problem of synthetic database generation and propose algorithms to effectively embed within the database, any given set of maximal pattern collections, and make the following contributions: 1. A database generation technique is presented which takes k maximal itemset collections as input, and constructs a database which produces these maximal collections as output, when mined at k levels of support. To analyze the efficiency of the procedure, upper bounds are provided on the number of transactions output in the generated database. 2. A compression method is used and extended to reduce the size of the output database. An optimization to the generation procedure is provided which could potentially reduce the number of transactions generated. 3. Preliminary experimental results are presented to demonstrate the feasibility of using the generation technique.
引用
收藏
页码:307 / 316
页数:10
相关论文
共 19 条
[1]  
AGRAWAL R, 1994, VLDB C
[2]  
[Anonymous], 1996, Advances in Knowledge Discovery and Data Mining, DOI DOI 10.1007/978-3-319-31750-2.
[3]  
Bayardo R, 1998, ACM SIGMOD C
[4]  
BYKOWSKI A, 2001, ACM PODS C
[5]  
Calders T., 2002, PRINCIPLES DATA MINI
[6]  
EVFIMIEVSKI A, 2002, ACM SIGKDD C
[7]  
GOETHALS B, 2001, IEEE ICDM C
[8]  
GOUDA K, 2001, IEEE ICDM C
[9]  
GUNOPULOS D, 2003, ACM TODS
[10]  
Han J., 2006, Data Mining: Concepts and Techniques, V340, P93205