CCSpan: Mining closed contiguous sequential patterns

被引:47
作者
Zhang, Jingsong [1 ]
Wang, Yinglin [2 ]
Yang, Dingyu [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept CSE, Shanghai 200030, Peoples R China
[2] Shanghai Univ Finance & Econ, Dept CST, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Data mining; Sequential pattern mining; Closed sequential pattern; Contiguous constraint; Closed contiguous sequential pattern; FREQUENT PATTERNS; EFFICIENT APPROACH; ALGORITHM;
D O I
10.1016/j.knosys.2015.06.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing closed sequential pattern mining generates a more compact yet complete resulting set compared with general sequential pattern mining. However, conventional closed sequential pattern mining algorithms pose a great challenge at spawning a large number of inefficient and redundant patterns, especially when using low support thresholds or pattern-enriched databases. Driven by wide applications of sequential patterns with contiguous constraint, we propose CCSpan (Closed Contiguous Sequential pattern mining), an efficient algorithm for mining closed contiguous sequential patterns, which contributes to a much more compact pattern set but with the same information w.r.t. closed sequential patterns. Moreover, with the shorter feature of patterns, the closed contiguous sequential patterns are preferred for feature selection and sequence classification based on the Minimum Description Length principle. CCSpan adopts a novel snippet-growth paradigm to generate a series of snippets as candidates, each of which is attached with a set of item(s) that precisely record the pattern's occurrences in the database, and CCSpan leverages three pruning techniques to improve the computational efficiency significantly. Our experiments based on both sparse and dense datasets demonstrated that CCSpan is efficient and scalable in terms of both database size and support threshold. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 35 条
  • [1] [Anonymous], 2012, P 6 ACM C REC SYST, DOI DOI 10.1145/2365952.2365979
  • [2] [Anonymous], 1996, MINING SEQUENTIAL PA
  • [3] Mining weighted sequential patterns in a sequence database with a time-interval weight
    Chang, Joong Hyuk
    [J]. KNOWLEDGE-BASED SYSTEMS, 2011, 24 (01) : 1 - 9
  • [4] Chen J., 2007, P 16 INT C WORLD WID, P1177
  • [5] Contiguous item sequential pattern mining using UpDown Tree
    Chen, Jinlin
    [J]. INTELLIGENT DATA ANALYSIS, 2008, 12 (01) : 25 - 49
  • [6] Mining closed partially ordered patterns, a new optimized algorithm
    Fabregue, Mickael
    Braud, Agnes
    Bringay, Sandra
    Le Ber, Florence
    Teisseire, Maguelonne
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 79 : 68 - 79
  • [7] Fournier-Viger Philippe, 2014, Advances in Knowledge Discovery and Data Mining. 18th Pacific-Asia Conference (PAKDD 2014). Proceedings: LNCS 8443, P40, DOI 10.1007/978-3-319-06608-0_4
  • [8] Fournier-Viger Philippe, 2013, Advanced Data Mining and Applications. 9th International Conference, ADMA 2013. Proceedings: LNCS 8346, P169, DOI 10.1007/978-3-642-53914-5_15
  • [9] Fournier-Viger P, 2014, J MACH LEARN RES, V15, P3389
  • [10] CMRules: Mining sequential rules common to several sequences
    Fournier-Viger, Philippe
    Faghihi, Usef
    Nkambou, Roger
    Nguifo, Engelbert Mephu
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 25 (01) : 63 - 76