An algorithm based on bit vector decomposition and hash linklist for mining frequent patterns in data streams

被引:0
作者
Ren, Jiadong [1 ,2 ]
Zhang, Yuanyuan [1 ,2 ]
Yu, Shiying [3 ]
机构
[1] College of Information Science and Engineering, Yanshan University, Qinhuangdao
[2] Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao City
[3] Hebei Provincial S and T Management Information Center, Shijiazhuang, 050021
关键词
Bit vector; Data streams; Frequent pattern; Hash linklist;
D O I
10.4156/jdcta.vol6.issue22.66
中图分类号
学科分类号
摘要
Most of algorithms based on bit vector for mining frequent pattern always produce candidate itemsets and scan the same data repeatedly; this increases the running time and the space consumption. In this paper, an algorithm BVDAHL based on bit vector decomposition and hash linklist for mining frequent patterns on data streams which solves the above issues, is proposed. The itemsets (transactions) whose corresponding number of 1 is k, denoted as k-one itemsets (k-one transactions). The arrival transactions are converted into bit vectors, and permutations and combinations are used to decompose converted results, then the decomposed itemsets are stored in the hash linklist. In hash function, according to the position of 1 of bit vector to compute the address, if the transaction has same address and same content that already exists in data domain, then the value adds one in count domain, otherwise the pointer in pointer domain points to the next node directly, it is no need to deal with conflict. Afterwards the property of anti-monotonic will be used to prune the k+1-one transactions in bit vector table if there exists infrequent k-one itemsets that decomposed by such k+1-one transactions. Then the algorithm repeats the above processes until the transaction in bit vector table can not be decomposed. Finally we will receive the frequent itemsets by comparing the obtained value of count domain in hash linklist with minSup. Experiment results show that BVDAHL is very efficiency and scalable.
引用
收藏
页码:570 / 579
页数:9
相关论文
共 15 条
  • [1] Manku G., Montuani R., Approximate frequency counts over data streams, International Conference on very large databases, pp. 346-357, (2002)
  • [2] Chi-Wing W.R., Wai-Chee F.A., Mining top-K frequent itemsets from data Streams, Journal of Data Mining &Knowledge Discovery, 13, 2, pp. 193-217, (2006)
  • [3] Giannella C., Han J., Pei J., Mining Frequent Patterns in Data Streams at Multiple Time Granularities, Proceedings of 2000 ACM SIGMOD International Conference on Management of Data, pp. 1-12, (2000)
  • [4] Han J., Pei J., Yin Y., Mining Frequent Patterns Without Candidate Generation, Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 1-12, (2000)
  • [5] Leung C., Khan Q., Li Z., Et al., CanTree: A Canonical-order Tree for Incremental Frequent-pattern Mining, Journal of Knowledge and Information Systems, 11, 3, pp. 287-311, (2007)
  • [6] Leung C., Khan Q., DSTree: A Tree Structure for the Mining of Frequent Sets From Data Streams, Proceedings of the 6th International Conference on Data Mining (ICDM), pp. 928-932, (2006)
  • [7] Agrawal R., Imielinski T., Swami A., Mining Association Rules Between Sets of Items in Large databases, Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207-216, (1993)
  • [8] Dong J., Han M., BitTableFI: An efficient mining frequent itemset algorithm, Journal of Knowledge-Based Systems, 20, 4, pp. 329-335, (2007)
  • [9] He H., Feng S., Ren J., Wang Q., A BitTable-based Algorithm of Mining Frequent Itemsets, Journal of IJACT: International Journal of Advancements in Computing Technology, 3, 8, pp. 198-204, (2011)
  • [10] Nhan L., Thuy N., Chong C., BitApriori: An Apriori-based Frequent Itemsets Mining Using Bit Streams, Proceedings of International Conference on ICISA, pp. 1-6, (2010)