Mining sequential patterns with periodic wildcard gaps

被引:71
作者
Wu, Youxi [1 ]
Wang, Lingling [1 ]
Ren, Jiadong [2 ]
Ding, Wei [3 ]
Wu, Xindong [4 ,5 ]
机构
[1] Hebei Univ Technol, Sch Comp Sci & Engn, Tianjin 300130, Peoples R China
[2] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao 066004, Peoples R China
[3] Univ Massachusetts, Dept Comp Sci, Boston, MA 02125 USA
[4] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
[5] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
关键词
Sequential pattern mining; Periodic wildcard gap; Pattern matching; Heuristic algorithm; Nettree;
D O I
10.1007/s10489-013-0499-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining frequent patterns with periodic wildcard gaps is a critical data mining problem to deal with complex real-world problems. This problem can be described as follows: given a subject sequence, a pre-specified threshold, and a variable gap-length with wildcards between each two consecutive letters. The task is to gain all frequent patterns with periodic wildcard gaps. State-of-the-art mining algorithms which use matrices or other linear data structures to solve the problem not only consume a large amount of memory but also run slowly. In this study, we use an Incomplete Nettree structure (the last layer of a Nettree which is an extension of a tree) of a sub-pattern P to efficiently create Incomplete Nettrees of all its super-patterns with prefix pattern P and compute the numbers of their supports in a one-way scan. We propose two new algorithms, MAPB (Mining sequentiAl Pattern using incomplete Nettree with Breadth first search) and MAPD (Mining sequentiAl Pattern using incomplete Nettree with Depth first search), to solve the problem effectively with low memory requirements. Furthermore, we design a heuristic algorithm MAPBOK (MAPB for tOp-K) based on MAPB to deal with the Top-K frequent patterns for each length. Experimental results on real-world biological data demonstrate the superiority of the proposed algorithms in running time and space consumption and also show that the pattern matching approach can be employed to mine special frequent patterns effectively.
引用
收藏
页码:99 / 116
页数:18
相关论文
共 33 条
  • [1] AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
  • [2] HUC-Prune: an efficient candidate pruning technique to mine high utility patterns
    Ahmed, Chowdhury Farhan
    Tanbeer, Syed Khairuzzaman
    Jeong, Byeong-Soo
    Lee, Young-Koo
    [J]. APPLIED INTELLIGENCE, 2011, 34 (02) : 181 - 198
  • [3] Efficient string matching with wildcards and length constraints
    Chen, Gong
    Wu, Xindong
    Zhu, Xingquan
    Arslan, Abdullah N.
    He, Yu
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 10 (04) : 399 - 419
  • [4] Ding BL, 2009, PROC INT CONF DATA, P1024, DOI 10.1109/ICDE.2009.104
  • [5] Sequential Pattern Mining with Wildcards
    Xie, Fei
    Wu, Xindong
    Hu, Xuegang
    Gao, Jun
    Guo, Dan
    Fei, Yulian
    Hua, Ertian
    [J]. 22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 1, 2010,
  • [6] Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph
    Guo, Dan
    Hu, Xuegang
    Xie, Fei
    Wu, Xindong
    [J]. APPLIED INTELLIGENCE, 2013, 39 (01) : 57 - 74
  • [7] He Y, 2007, IRI 2007: PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, P329
  • [8] Mining sequential patterns in the B2B environment
    Hu, Ya-Han
    Chen, Yen-Liang
    Tang, Kwei
    [J]. JOURNAL OF INFORMATION SCIENCE, 2009, 35 (06) : 677 - 694
  • [9] Mining the change of customer behavior in fuzzy time-interval sequential patterns
    Huang, Tony Cheng-Kui
    [J]. APPLIED SOFT COMPUTING, 2012, 12 (03) : 1068 - 1086
  • [10] Mining minimal distinguishing subsequence patterns with gap constraints
    Ji, Xiaonan
    Bailey, James
    Dong, Guozhu
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 11 (03) : 259 - 286