The research of sampling for mining frequent itemsets

被引:0
作者
Hu, Xuegang [1 ]
Yu, Haitao [1 ]
机构
[1] Hefei Univ Technol, Dept Comp & Informat Technol, Hefei 230009, Peoples R China
来源
ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS | 2006年 / 4062卷
关键词
data mining; frequent itemsets; association rule; weighted sampling; statistical optimal sample size;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficiently mining frequent itemsets is the key step in extracting association rules from large scale databases. Considering the restriction of min-support in mining association rules, a weighted sampling algorithm for mining frequent itemsets is proposed in the paper. First of all, a weight is given to each transaction data. Then according to the statistical optimal sample size of database, a sample is extracted based on weight of data. In terms of the algorithm, the sample includes large amounts of transaction data consisting of the frequent itemsets with many items inside, so that the frequent itemsets mined from sample are similar to those gained from the original data. Furthermore, the algorithm can shrink the sample size and guarantee the sample quality at the same time. The experiment verifys the validity.
引用
收藏
页码:496 / 501
页数:6
相关论文
共 8 条
[1]  
AGRAWAL R, 2000, P ACM SIGMOD LOS ANG, P207
[2]  
Agrawal R., 1996, ADV KNOWLEDGE DISCOV
[3]  
GU BH, EFFICIENTLY DETERMIN
[4]  
KULLBACK S, 1959, INFORMATION THEORY S
[5]  
PARTJASARATYJ S, EFFICIENT PROGRESSIV
[6]  
Toivonen H, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P134
[7]  
WANG CH, 2000, J COMPUTER RES DEV, P1101
[8]  
ZAKI MJ, 1996, EVALUATION SAMPLING, P617