Mining top-k co-occurrence items with sequential pattern

被引:32
作者
Tung Kieu [1 ]
Bay Vo [2 ,3 ]
Tuong Le [4 ,5 ]
Deng, Zhi-Hong [6 ]
Bac Le [1 ]
机构
[1] Univ Sci, VNU HCM, Fac Informat Technol, Ho Chi Minh City, Vietnam
[2] Ho Chi Minh City Univ Technol, Fac Informat Technol, Ho Chi Minh City, Vietnam
[3] Sejong Univ, Coll Elect & Informat Engn, Seoul, South Korea
[4] Ton Duc Thang Univ, Div Data Sci, Ho Chi Minh City, Vietnam
[5] Ton Duc Thang Univ, Fac Informat Technol, Ho Chi Minh City, Vietnam
[6] Peking Univ, Sch Elect Engn & Comp Sci, Minist Educ, Key Lab Machine Percept, Beijing 100871, Peoples R China
关键词
Top-k mining; Co-occurrence sequential mining; Sequential pattern mining; EFFICIENT; ALGORITHM;
D O I
10.1016/j.eswa.2017.05.021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent sequential pattern mining has become one of the most important tasks in data mining. It has many applications, such as sequential analysis, classification, and prediction. How to generate candidates and how to control the combinatorically explosive number of intermediate subsequences are the most difficult problems. Intelligent systems such as recommender systems, expert systems, and business intelligence systems use only a few patterns, namely those that satisfy a number of defined conditions. Challenges include the mining of top-k patterns, top-rank-k patterns, closed patterns, and maximal patterns. In many cases, end users need to find itemsets that occur with a sequential pattern. Therefore, this paper proposes approaches for mining top-k co-occurrence items usually found with a sequential pattern. The Naive Approach Mining (NAM) algorithm discovers top-k co-occurrence items by directly scanning the sequence database to determine the frequency of items. The Vertical Approach Mining (VAM) algorithm is based on vertical database scanning. The Vertical with Index Approach Mining (VIAM) algorithm is based on a vertical database with index scanning. VAM and VIAM use pruning strategies to reduce the search space, thus improving performance. VAM and VIAM are especially effective in mining the co-occurrence items of a long input pattern. The three algorithms were evaluated using real-world databases. The experimental results show that these algorithms perform well, especially VAM and VIAM. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:123 / 133
页数:11
相关论文
共 28 条
[1]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2]  
Amphawan K., 2011, PAC AS C KNOWL DISC, P124
[3]   Mining top-k frequent-regular closed patterns [J].
Amphawan, Komate ;
Lenca, Philippe .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (21) :7882-7894
[4]  
Amphawan K, 2009, COMM COM INF SC, V55, P18
[5]  
[Anonymous], 1994, P INT C VERY LARGE D
[6]  
Ayres J., 2002, P ACM SIGKDD INT C K, P429
[7]   Mining frequent closed inter-sequence patterns efficiently using dynamic bit vectors [J].
Bac Le ;
Minh-Thai Tran ;
Bay Vo .
APPLIED INTELLIGENCE, 2015, 43 (01) :74-84
[8]   Applying sequential rules to protein localization prediction [J].
Baralis, Elena ;
Chiusano, Silvia ;
Dutto, Riccardo .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (05) :867-878
[9]  
Berry MichaelJ., 1997, DATA MINING TECHNIQU
[10]  
Ceci M, 2014, LECT NOTES ARTIF INT, V8777, P49, DOI 10.1007/978-3-319-11812-3_5