A new sampling technique for association rule mining

被引:14
作者
Mahafzah, Basel A. [1 ]
Al-Badarneh, Amer F. [2 ]
Zakaria, Mohammed Z. [2 ]
机构
[1] Univ Jordan, King Abdullah II Sch Informat Technol, Dept Comp Sci, Amman 11942, Jordan
[2] Jordan Univ Sci & Technol, Sch Comp & Informat Technol, Amman, Jordan
关键词
sampling; parameterized sampling; data reduction; data mining; association rule mining; information retrieval;
D O I
10.1177/0165551508100382
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Association Rule Mining (ARM) is one of the data mining techniques used to extract hidden knowledge from datasets, that can be used by an organization's decision makers to improve overall profit. However, performing ARM requires repeated passes over the entire database. Obviously, for large database, the role of input/output overhead in scanning the database is very significant. A popular solution to improve the speed of ARM is to apply the mining algorithm on a sample instead of the entire database. In this paper, a parameterized sampling algorithm for ARM is presented. This algorithm extracts sample datasets based on three parameters: transaction frequency, transaction length and transaction frequency-length. To evaluate its performance and accuracy, a comparison against a two-phase sampling-based algorithm is performed using real and synthetic datasets. The experimental results show that the proposed sampling algorithm in some cases outperforms two-phase sampling algorithm, and achieves up to 98% accuracy.
引用
收藏
页码:358 / 376
页数:19
相关论文
共 32 条
  • [1] Agarwal R., 1994, VLDB, V487, P499, DOI DOI 10.5555/645920.672836
  • [2] Agarwal R. C., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P108, DOI 10.1145/347090.347114
  • [3] AGGARWAL C, 2006, P VLDB C
  • [4] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [5] [Anonymous], 2006, Introduction to Data Mining
  • [6] [Anonymous], 2006, GESTS International Transactions on Computer Science and Engineering
  • [7] [Anonymous], 2006, ACM Computing Surveys, DOI DOI 10.1145/1132956.1132958
  • [8] Antonie Maria-Luiza., 2004, 9 ACM SIGMOD WORTEHO, P64, DOI [DOI 10.1145/1008694.1008705, 10.1145/1008694.1008705]
  • [9] BAYARDO R, FREQUENT ITEMSET MIN
  • [10] Borgelt C., 2003, P FIMI 03 WORKSH FRE, P26