Efficient Mining of Robust Closed Weighted Sequential Patterns Without Information Loss

被引:21
作者
Yun, Unil [1 ]
Pyun, Gwangbum [1 ]
Yoon, Eunchul [2 ]
机构
[1] Sejong Univ, Dept Comp Engn, Seoul, South Korea
[2] Konkuk Univ, Dept Elect Engn, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Closed weighted sequential pattern mining; information loss; sequential pattern; FREQUENT PATTERNS; ALGORITHM;
D O I
10.1142/S0218213015500074
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sequential pattern mining has become one of the most important topics in data mining. It has broad applications such as analyzing customer purchase data, Web access patterns, network traffic data, DNA sequencing, and so on. Previous studies have concentrated on reducing redundant patterns among the sequential patterns, and on finding meaningful patterns from huge datasets. In sequential pattern mining, closed sequential pattern mining and weighted sequential pattern mining are the two main approaches to perform mining tasks. This is because closed sequential pattern mining finds representative sequential patterns which show exactly the same knowledge as the complete set of frequent sequential patterns, and weight-based sequential pattern mining discovers important sequential patterns by considering the importance of each sequential pattern. In this paper, we study the problem of mining robust closed weighted sequential patterns by integrating two paradigms from large sequence databases. We first show that the joining order between the weight constraints and the closure property in sequential pattern mining leads to different sets of results. From our analysis of joining orders, we suggest robust closed weighted sequential pattern mining without information loss, and present how to discover representative important sequential patterns without information loss. Through performance tests, we show that our approach gives high performance in terms of efficiency, effectiveness, memory usage, and scalability.
引用
收藏
页数:28
相关论文
共 52 条
[1]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2]  
[Anonymous], 2008, KDD
[3]  
Ayres J., 2002, P ACM SIGKDD INT C K, P429
[4]  
Besson R, 2004, LECT NOTES ARTIF INT, V3056, P615
[5]   On condensed representations of constrained frequent patterns [J].
Bonchi, F ;
Lucchese, C .
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (02) :180-201
[6]  
Bonchi F., 2009, ENCY DATA WAREHOUSIN, P313
[7]  
Boulicaut JF, 2010, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, SECOND EDITION, P339, DOI 10.1007/978-0-387-098234_17
[8]   Comparative analysis of sequence weighting approaches for mining time-interval weighted sequential patterns [J].
Chang, Joong Hyuk ;
Park, Nam Hun .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (03) :3867-3873
[9]   Mining weighted sequential patterns in a sequence database with a time-interval weight [J].
Chang, Joong Hyuk .
KNOWLEDGE-BASED SYSTEMS, 2011, 24 (01) :1-9
[10]   SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows [J].
Chang, Lei ;
Wang, Tengjiao ;
Yang, Dongqing ;
Luan, Hua .
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :83-+