HARPP: HARnessing the Power of Power Sets for Mining Frequent Itemsets

被引：5

作者：

Yasir, Muhammad ^{[1
]}

Habib, Muhammad Asif ^{[2
]}

Sarwar, Shahzad ^{[3
]}

Faisal, Chaudhry Muhammad Nadeem ^{[2
]}

Ahmad, Mudassar ^{[2
]}

Jabbar, Sohail ^{[2
]}

机构：

[1] Univ Engn & Technol Lahore, Dept Comp Sci, Faisalabad Campus, Faisalabad, Pakistan

[2] Natl Text Univ, Dept Comp Sci, Faisalabad, Pakistan

[3] Univ Punjab, Punjab Univ Coll Informat Technol, Lahore, Pakistan

来源：

INFORMATION TECHNOLOGY AND CONTROL | 2019年 / 48卷 / 03期

关键词：

Association Rules; Frequent Itemset Mining; Apriori; FP-Growth; Recommendation Systems; N-LIST; ALGORITHM; PATTERNS; CBAR;

D O I：

10.5755/j01.itc.48.3.21137

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern algorithms for mining frequent itemsets face the noteworthy deterioration of performance when minimum support tends to decrease, especially for sparse datasets. Long-tailed itemsets, frequent itemsets found at lower minimum support, are significant for present-day applications such as recommender systems. In this study, a novel power set based method named as HARnessing the Power of Power sets (HARPP) for mining frequent itemsets is developed. HARPP is based on the concept of power set from set theory and incorporates efficient data structures for mining. Without storing it entirely in memory, HARPP scans the dataset only once and mines frequent itemsets on the fly. In contrast to state-of-the-art, the efficiency of HARPP increases with a decrease in minimum support that makes it a viable technique for mining long-tailed itemsets. A performance study shows that HARPP is efficient and scalable. It is faster up to two orders of magnitude than FP-Growth algorithm at lower minimum support, particularly when datasets are sparse. HARPP memory consumption is less than that of state-of-the-art by an order of magnitude, on most datasets.

引用

页码：415 / 431

页数：17

共 64 条

[1] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2] Agrawal R., P 20 INT C VERY LARG, DOI DOI 10.1055/S-2007-996789
[3] A Hybrid Book Recommender System Based on Table of Contents (ToC) and Association Rule Mining
Ali, Zafar
Khusro, Shah
Ullah, Irfan
[J]. INTERNATIONAL CONFERENCE ON INFORMATICS AND SYSTEMS (INFOS 2016), 2016, : 68 - 74
[4] Anand R., 2014, MINING MASSIVE DATAS, DOI [10.1017/CBO9781139924801, DOI 10.1017/CB09781139924801]
[5] [Anonymous], 2004, P 2004 ACM S APPL CO, DOI DOI 10.1145/967900.968012
[6] A Parallel MapReduce Algorithm to Efficiently Support Itemset Mining on High Dimensional Data
Apiletti, Daniele
Baralis, Elena
Cerquitelli, Tania
Garza, Paolo
Pulvirenti, Fabio
Michiardi, Pietro
[J]. BIG DATA RESEARCH, 2017, 10 : 53 - 69
[7] Frequent Itemsets Mining for Big Data: A Comparative Analysis
Apiletti, Daniele
Baralis, Elena
Cerquitelli, Tania
Garza, Paolo
Pulvirenti, Fabio
Venturini, Luca
[J]. BIG DATA RESEARCH, 2017, 9 : 67 - 83
[8] Asha T., 2011, Proceedings of the International Conference Workshop on Emerging Trends in Technology, P1327
[9] Scale-Free Networks: A Decade and Beyond
Barabasi, Albert-Laszlo
[J]. SCIENCE, 2009, 325 (5939) : 412 - 413
[10] A novel approach for mining maximal frequent patterns
Bay Vo
Sang Pham
Tuong Le
Deng, Zhi-Hong
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 : 178 - 186

← 1 2 3 4 5 6 7 →