TKFIM: Top-K frequent itemset mining technique based on equivalence classes

被引:5
作者
Iqbal, Saood [1 ]
Shahid, Abdul [1 ]
Roman, Muhammad [1 ]
Khan, Zahid [2 ]
Al-Otaibi, Shaha [3 ]
Yu, Lisu [4 ,5 ]
机构
[1] Kohat Univ Sci & Technol, Inst Comp, Kohat, Kpk, Pakistan
[2] Prince Sultan Univ, Robot & Internet Things Lab, Riyadh, Saudi Arabia
[3] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Informat Syst Dept, Riyadh, Saudi Arabia
[4] Nanchang Univ, Sch Informat Engn, Nanchang, Jiangxi, Peoples R China
[5] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China
关键词
Frequent Itemsets; Support Threshold; Algorithm Analysis; Top-k Frequent Itemsets; Artifical Intelligence; EFFICIENT; THRESHOLD; PATTERNS;
D O I
10.7717/peerj-cs.385
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequently used items mining is a significant subject of data mining studies. In the last ten years, due to innovative development, the quantity of data has grown exponentially. For frequent Itemset (FIs) mining applications, it imposes new challenges. Misconceived information may be found in recent algorithms, including both threshold and size based algorithms. Threshold value plays a central role in generating frequent itemsets from the given dataset. Selecting a support threshold value is very complicated for those unaware of the dataset's characteristics. The performance of algorithms for finding FIs without the support threshold is, however, deficient due to heavy computation. Therefore, we have proposed a method to discover FIs without the support threshold, called Top-k frequent itemsets mining (TKFIM). It uses class equivalence and set-theory concepts for mining FIs. The proposed procedure does not miss any FIs; thus, accurate frequent patterns are mined. Furthermore, the results are compared with state-of-the-art techniques such as Top-k miner and Build Once and Mine Once (BOMO). It is found that the proposed TKFIM has outperformed the results of these approaches in terms of execution and performance, achieving 92.70, 35.87, 28.53, and 81.27 percent gain on Top-k miner using Chess, Mushroom, and Connect and T1014D100K datasets, respectively. Similarly, it has achieved a performance gain of 97.14, 100, 78.10, 99.70 percent on BOMO using Chess, Mushroom, Connect, and T1014D100K datasets, respectively. Therefore, it is argued that the proposed procedure may be adopted on a large dataset for better performance.
引用
收藏
页码:1 / 27
页数:27
相关论文
共 23 条
[1]  
Ada Wai-Chee Fu, 2000, Foundations of Intelligent Systems. 12th International Symposium, ISMIS 2000. Proceedings (Lecture Notes in Artificial Intelligence Vol.1932), P59
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]  
Amphawan K, 2009, COMM COM INF SC, V55, P18
[4]  
[Anonymous], 2002, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, DOI DOI 10.1145/564691.564737
[5]   Mining frequent itemsets without support threshold: With and without item constraints [J].
Cheung, YL ;
Fu, AWC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (09) :1052-1069
[6]  
Fournier-Viger P., 2017, Data Sci. Pattern Recogn, V1, P54
[7]  
Goethals B., 2003, Frequent itemset mining dataset repository
[8]  
Han JW, 2002, 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P211, DOI 10.1109/ICDM.2002.1183905
[9]  
Han JW, 2000, SIGMOD RECORD, V29, P1
[10]   Mining top-k high utility itemsets with effective threshold raising strategies [J].
Krishnamoorthy, Srikumar .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 117 :148-165