Efficient frequent itemsets mining through sampling and information granulation

被引:17
作者
Zhang, Zhongjie [1 ]
Pedrycz, Witold [2 ]
Huang, Jian [1 ]
机构
[1] Natl Univ Def Technol, Coll Mech Engn & Automat, Changsha 410073, Hunan, Peoples R China
[2] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6R 2G7, Canada
关键词
Frequent itemsets mining; Sampling; Information granulation; PATTERN TREE; BITTABLEFI; ALGORITHM;
D O I
10.1016/j.engappai.2017.07.016
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this study, we propose an algorithm forming high quality approximate frequent itemsets from those datasets with a large scale of transactions. The results produced by the algorithm with high probability contain all frequent itemsets, no itemset with support much lower than the minimum support is included, and supports obtained by the algorithm are close to the real values. To avoid an over-estimated sample size and a significant computing overhead, the task of reducing data is decomposed into three subproblems, and sampling and information granulation are used to solve them one by one. Firstly, the algorithm obtains rough support of every item by sampling and removes those infrequent items, so the data are simplified. Then, another sample is taken from the simplified data, and is clustered into some information granules. After data reduction, these granules obtained in this way are mined by the improved Apriori. A tight guarantee for the quality of final results is provided. The performance of the approach is quantified through a series of experiments. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:119 / 136
页数:18
相关论文
共 46 条
  • [1] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [2] Hierarchical cluster ensemble selection
    Akbari, Ebrahim
    Dahlan, Halina Mohamed
    Ibrahim, Roliana
    Alizadeh, Hosein
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 39 : 146 - 156
  • [3] DFP-SEPSF: A dynamic frequent pattern tree to mine strong emerging patterns in streamwise features
    Alavi, Fatemeh
    Hashemi, Sattar
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 37 : 54 - 70
  • [4] [Anonymous], 2006, P 6 IEEE INT C COMP
  • [5] Bargiela A., 2012, Granular computing: an introduction, V717
  • [6] DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets
    Bay Vo
    Hong, Tzung-Pei
    Bac Le
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) : 7196 - 7206
  • [7] Berengut D., 2012, STAT EXPT DESIGN INN
  • [8] Bronnimann H., 2003, KDD, P59
  • [9] Genome sequence of the nematode C-elegans:: A platform for investigating biology
    不详
    [J]. SCIENCE, 1998, 282 (5396) : 2012 - 2018
  • [10] Chakaravarthy V., 2009, P 12 INT C DATABASE, P276