Mining top-rank-k frequent weighted itemsets using WN-list structures and an early pruning strategy

被引:16
作者
Bay Vo [1 ]
Huong Bui [2 ,3 ]
Thanh Vo [4 ]
Tuong Le [5 ,6 ]
机构
[1] Ho Chi Minh City Univ Technol HUTECH, Fac Informat Technol, Ho Chi Minh City 700000, Vietnam
[2] Univ Informat Technol, Fac Comp Sci, Ho Chi Minh City 700000, Vietnam
[3] Vietnam Natl Univ, Ho Chi Minh City 700000, Vietnam
[4] Duy Tan Univ, Inst Res & Dev, Da Nang 550000, Vietnam
[5] Ton Duc Thang Univ, Informetr Res Grp, Ho Chi Minh City 700000, Vietnam
[6] Ton Duc Thang Univ, Fac Informat Technol, Ho Chi Minh City 700000, Vietnam
关键词
Data mining; Pattern mining; Frequent weighted itemsets; N-list structure; Top-rank-k pattern; EFFICIENT ALGORITHMS; N-LIST; ERASABLE ITEMSETS; ASSOCIATION RULES; CLOSED ITEMSETS; PATTERNS;
D O I
10.1016/j.knosys.2020.106064
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent weighted itemsets (FWIs) are a variation of frequent itemsets (FIs) that take into account the different importance or weights for each item. Many algorithms have been introduced for mining FWIs recently. However, the traditional algorithms for mining FWIs produce a large number of FWIs which causes difficulties when applied with intelligent systems. Therefore, this study first introduces the problem of mining top-rank-k FWIs from weighted databases that combines the mining and ranking phases into one without finding all FWIs to increase their usability in practical applications. As the second contribution, three baseline algorithms for mining top-rank-k FWIs, namely TFWIT, TFWID and TFWIN that use state-of-the-art data structures, namely tidset, diffset and WN-list structures, are developed. Next, this study proposes the threshold raising strategy and the early pruning strategy supported by a new theorem to effectively mine top-rank-k FWIs. An improved version of TFWIN named TFWIN+ employs these strategies to improve the performance of mining top-rank-k FWIs and is more efficient when compared to the original version. Finally, the empirical evaluations in terms of processing time and memory usage among these algorithms were conducted to show the effectiveness of TFWIN+. The experimental results show that TFWIN+ outperforms TFWIT, TFWID and TFWIN for mining top-rank-k FWIs. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 56 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Le B, 2014, IEEE SYS MAN CYBERN, P2008
[3]  
Bay V, 2017, J INF SCI ENG, V33, P199
[4]   A novel approach for mining maximal frequent patterns [J].
Bay Vo ;
Sang Pham ;
Tuong Le ;
Deng, Zhi-Hong .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 :178-186
[5]   Mining erasable itemsets with subset and superset itemset constraints [J].
Bay Vo ;
Tuong Le ;
Pedrycz, Witold ;
Giang Nguyen ;
Baik, Sung Wook .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 69 :50-61
[6]   A new method for mining Frequent Weighted Itemsets based on WIT-trees [J].
Bay Vo ;
Coenen, Frans ;
Bac Le .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (04) :1256-1264
[7]   Mining high occupancy itemsets [J].
Deng, Zhi-Hong .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 102 :222-229
[8]   PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning [J].
Deng, Zhi-Hong ;
Lv, Sheng-Long .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (13) :5424-5432
[9]   Fast mining Top-Rank-k frequent patterns by using Node-lists [J].
Deng, Zhi-Hong .
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (04) :1763-1768
[10]   Fast mining erasable itemsets using NC_sets [J].
Deng, Zhi-Hong ;
Xu, Xiao-Ran .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (04) :4453-4463