A SINGLE-SCAN ALGORITHM FOR MINING SEQUENTIAL PATTERNS FROM DATA STREAMS

被引:0
作者
Li, Hua-Fu [1 ]
Ho, Chin-Chuan [2 ]
Chen, Hsuan-Sheng [2 ]
Lee, Suh-Yin [2 ]
机构
[1] Kainan Univ, Dept Informat Management, Tao Yuan 338, Taiwan
[2] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu 300, Taiwan
来源
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL | 2012年 / 8卷 / 3A期
关键词
Data streams; Data mining; Data stream mining; Sequential pattern mining; INCREMENTAL UPDATE; FREQUENT ITEMSETS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sequential pattern mining (SPAM) is one of the most interesting research issues of data mining. In this paper, a new research problem of mining data streams for sequential patterns is defined. A data stream is an unbound sequence of data elements arriving at a rapid rate. Based on the characteristics of data streams, the problem complexity of mining data streams for sequential patterns is more difficult than that of mining sequential patterns from large static databases. Therefore, mining sequential patterns from data streams is a challenging research issue of data mining and knowledge discovery. Hence, an efficient single-pass algorithm, called IncSparn (Incremental Sequential pattern mining of streaming itemset-sequences), is proposed for discovering sequential patterns from streaming itemset-sequences over extended sliding window models. In the framework of IncSpam algorithm, a new sliding window model, called CSW-BV (Customer Sliding Window with Bit-Vectors), and an extended lexicographic tree-based data structure, called LesSeq-Tree (Lexicographic Sequence Tree), are developed to reduce the time and memory needed to slide the windows over streaming data and maintain all sequential patterns of current sliding windows. Experimental results show that the proposed method is an efficient single-pass algorithm for mining sequential patterns from streaming data.
引用
收藏
页码:1799 / 1820
页数:22
相关论文
共 24 条
  • [1] AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
  • [2] Ayres J., 2002, P ACM SIGKDD INT C K, P429
  • [3] Efficient mining method for retrieving sequential patterns over online data streams
    Chang, JH
    Lee, WS
    [J]. JOURNAL OF INFORMATION SCIENCE, 2005, 31 (05) : 420 - 432
  • [4] Chang JH, 2004, IEICE T INF SYST, VE87D, P1588
  • [5] Chen A., 2004, INT J COMPUTATIONAL, V2, P64
  • [6] Chen G, 2005, Fifth IEEE International Conference on Data Mining, Proceedings, P585
  • [7] Cheng H., 2004, P 10 ACM INT C KNOWL, P97
  • [8] Do HJ, 2009, INT J INNOV COMPUT I, V5, P4393
  • [9] El-Sayed Maged., 2004, Proceedings of the 6th annual ACM international workshop on Web information and data management, P128
  • [10] Golab L, 2003, SIGMOD REC, V32, P5, DOI 10.1145/776985.776986