CLS-Miner: efficient and effective closed high-utility itemset mining

被引:45
作者
Thu-Lan Dam [1 ,2 ]
Li, Kenli [1 ,3 ,4 ]
Fournier-Viger, Philippe [5 ]
Quang-Huy Duong [6 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Hanoi Univ Ind, Fac Informat Technol, Hanoi, Vietnam
[3] Natl Univ Def Technol, CIC HPC, Changsha 410073, Hunan, Peoples R China
[4] Natl Supercomp Ctr Changsha, Changsha 410082, Hunan, Peoples R China
[5] Harbin Inst Technol, Shenzhen Grad Sch, Sch Nat Sci & Humanities, Shenzhen 518055, Peoples R China
[6] Norwegian Univ Sci & Technol, Dept Comp Sci, Trondheim, Norway
基金
中国国家自然科学基金;
关键词
utility mining; high-utility itemset mining; closed itemset mining; closed high-utility itemset mining; ALGORITHMS;
D O I
10.1007/s11704-016-6245-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result, analyzing HUIs can be very time consuming for users. Moreover, a large set of HUIs also makes HUIM algorithms less efficient in terms of execution time and memory consumption. To address this problem, closed high-utility itemsets (CHUIs), concise and lossless representations of all HUIs, were proposed recently. Although mining CHUIs is useful and desirable, it remains a computationally expensive task. This is because current algorithms often generate a huge number of candidate itemsets and are unable to prune the search space effectively. In this paper, we address these issues by proposing a novel algorithm called CLS-Miner. The proposed algorithm utilizes the utility-list structure to directly compute the utilities of itemsets without producing candidates. It also introduces three novel strategies to reduce the search space, namely chain-estimated utility co-occurrence pruning, lower branch pruning, and pruning by coverage. Moreover, an effective method for checking whether an itemset is a subset of another itemset is introduced to further reduce the time required for discovering CHUIs. To evaluate the performance of the proposed algorithm and its novel strategies, extensive experiments have been conducted on six benchmark datasets having various characteristics. Results show that the proposed strategies are highly efficient and effective, that the proposed CLS-Miner algorithm outperforms the current state-of-the-art CHUD and CHUI-Miner algorithms, and that CLS-Miner scales linearly.
引用
收藏
页码:357 / 381
页数:25
相关论文
共 36 条
[1]  
Agrawal R., P 20 INT C VERY LARG, DOI DOI 10.1055/S-2007-996789
[2]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[3]  
Chan R, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P19
[4]   A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment [J].
Chen, Jianguo ;
Li, Kenli ;
Tang, Zhuo ;
Bilal, Kashif ;
Yu, Shui ;
Weng, Chuliang ;
Li, Keqin .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (04) :919-933
[5]   PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning [J].
Deng, Zhi-Hong ;
Lv, Sheng-Long .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (13) :5424-5432
[6]  
Fournier-Viger P., 2014, JOURNAL OF MACHINE L, V15, P3569
[7]  
Fournier-Viger P, 2016, FHM FASTER HIGH UTIL, P115
[8]  
Fournier-Viger P, 2014, LECT NOTES ARTIF INT, V8933, P16, DOI 10.1007/978-3-319-14717-8_2
[9]  
Fournier-Viger P, 2014, LECT NOTES ARTIF INT, V8933, P30, DOI 10.1007/978-3-319-14717-8_3
[10]  
Fournier-Viger P, 2014, LECT NOTES COMPUT SC, V8436, P83, DOI 10.1007/978-3-319-06483-3_8