Mining Closed High Utility Itemsets based on Propositional Satisfiability

被引:9
作者
Hidouri, Amel [1 ,2 ]
Jabbour, Said [2 ]
Raddaoui, Badran [3 ]
Ben Yaghlane, Boutheina [1 ]
机构
[1] Univ Tunis, LARODEC, Tunis, Tunisia
[2] Univ Artois, CRIL, CNRS, UMR 8188, Arras, France
[3] Inst Polytech Paris, Telecom SudParis, SAMOVAR, Paris, France
关键词
Data Mining; High Utility; Symbolic Artificial Intelligence; Propositional Satisfiability;
D O I
10.1016/j.datak.2021.101927
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A high utility itemset mining problem is the question of recognizing a set of items that have utility values greater than a given user utility threshold. This generalization of the classical problem of frequent itemset mining is a useful and well-known task in data analysis and data mining, since it is used in a wide range of real applications. In this paper, we first propose to use symbolic Artificial Intelligence for computing the set of all closed high utility itemsets from transaction databases. Our approach is based on reduction to enumeration problems of propositional satisfiability. Then, we enhance the efficiency of our SAT-based approach using the weighted clique cover problem. After that, in order to improve scalability, a decomposition technique is applied to derive smaller and independent sub-problems in order to capture all the closed high utility itemsets. Clearly, our SAT-based encoding can be constantly enhanced by integrating the last improvements in powerful SAT solvers and models enumeration algorithms. Finally, through empirical evaluations on different real-world datasets, we demonstrate that the proposed approach is very competitive with state-of-the-art specialized algorithms for high utility itemsets mining, while being sufficiently flexible to take into account additional constraints to finding closed high utility itemsets.
引用
收藏
页数:15
相关论文
共 35 条
[1]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[2]   Enumerating Non-redundant Association Rules Using Satisfiability [J].
Boudane, Abdelhamid ;
Jabbour, Said ;
Sais, Lakhdar ;
Salhi, Yakoub .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 :824-836
[3]  
Boudane Abdelhamid, 2016, P 25 INT JOINT C ART, P2472
[4]   Finding Maximal Cliques in Massive Networks [J].
Cheng, James ;
Ke, Yiping ;
Fu, Ada Wai-Chee ;
Yu, Jeffrey Xu ;
Zhu, Linhong .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2011, 36 (04)
[5]   A MACHINE PROGRAM FOR THEOREM-PROVING [J].
DAVIS, M ;
LOGEMANN, G ;
LOVELAND, D .
COMMUNICATIONS OF THE ACM, 1962, 5 (07) :394-397
[6]  
De Raedt L., 2008, KDD, P204
[7]   A Parallel SAT-Based Framework for Closed Frequent Itemsets Mining [J].
Dlala, Imen Ouled ;
Jabbour, Said ;
Raddaoui, Badran ;
Sais, Lakhdar .
PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, 2018, 11008 :570-587
[8]   Efficient high utility itemset mining using buffered utility-lists [J].
Duong, Quang-Huy ;
Fournier-Viger, Philippe ;
Ramampiaro, Heri ;
Norvag, Kjetil ;
Dam, Thu-Lan .
APPLIED INTELLIGENCE, 2018, 48 (07) :1859-1877
[9]   The maximum clique enumeration problem: algorithms, applications, and implementations [J].
Eblen, John D. ;
Phillips, Charles A. ;
Rogers, Gary L. ;
Langston, Michael A. .
BMC BIOINFORMATICS, 2012, 13 :S5
[10]   An extensible SAT-solver [J].
Eén, N ;
Sörensson, N .
THEORY AND APPLICATIONS OF SATISFIABILITY TESTING, 2004, 2919 :502-518