CloFAST: closed sequential pattern mining using sparse and vertical id-lists

被引:0
作者
Fabio Fumarola
Pasqua Fabiana Lanotte
Michelangelo Ceci
Donato Malerba
机构
[1] University of Bari “A. Moro”,Department of Computer Science
来源
Knowledge and Information Systems | 2016年 / 48卷
关键词
Sequential pattern mining; Closed sequences; Data mining; Itemset;
D O I
暂无
中图分类号
学科分类号
摘要
Sequential pattern mining is a computationally challenging task since algorithms have to generate and/or test a combinatorially explosive number of intermediate subsequences. In order to reduce complexity, some researchers focus on the task of mining closed sequential patterns. This not only results in increased efficiency, but also provides a way to compact results, while preserving the same expressive power of patterns extracted by means of traditional (non-closed) sequential pattern mining algorithms. In this paper, we present CloFAST, a novel algorithm for mining closed frequent sequences of itemsets. It combines a new data representation of the dataset, based on sparse id-lists and vertical id-lists, whose theoretical properties are studied in order to fast count the support of sequential patterns, with a novel one-step technique both to check sequence closure and to prune the search space. Contrary to almost all the existing algorithms, which iteratively alternate itemset extension and sequence extension, CloFAST proceeds in two steps. Initially, all closed frequent itemsets are mined in order to obtain an initial set of sequences of size 1. Then, new sequences are generated by directly working on the sequences, without mining additional frequent itemsets. A thorough performance study with both real-world and artificially generated datasets empirically proves that CloFAST outperforms the state-of-the-art algorithms, both in time and memory consumption, especially when mining long closed sequences.
引用
收藏
页码:429 / 463
页数:34
相关论文
共 38 条
  • [1] Burdick D(2005)MAFIA: a maximal frequent itemset algorithm IEEE Trans Knowl Data Eng 17 1490-1504
  • [2] Calimlim M(2006)Spatial associative classification: propositional vs structural approach J Intell Inf Syst 27 191-213
  • [3] Flannick J(2013)Closed and noise-tolerant patterns in n-ary relations Data Min Knowl Discov 26 574-619
  • [4] Gehrke J(2006)Catch the moment: maintaining closed frequent itemsets over a data stream sliding window Knowl Inf Syst 10 265-294
  • [5] Yiu T(2008)A two-stage methodology for sequence classification based on sequential pattern mining and optimization Data Knowl Eng 66 467-487
  • [6] Ceci M(2010)An efficient method of web sequential pattern mining based on session filter and transaction identification J Netw 5 1017-1024
  • [7] Appice A(2006)Cp-miner: finding copy-paste and related bugs in large-scale software code IEEE Trans Softw Eng 32 176-192
  • [8] Cerf L(2009)Efficient mining of sequential patterns with time constraints: reducing the combinations Expert Syst Appl Int J 36 2677-2690
  • [9] Besson J(2009)Computational annotation of UTR cis-regulatory modules through frequent pattern mining BMC Bioinform 10 1-12
  • [10] Nguyen K-N(2007)Frequent closed sequence mining without candidate maintenance IEEE Trans. Knowl. Data Eng. 19 1042-1056