HARPP: HARnessing the Power of Power Sets for Mining Frequent Itemsets

被引:5
作者
Yasir, Muhammad [1 ]
Habib, Muhammad Asif [2 ]
Sarwar, Shahzad [3 ]
Faisal, Chaudhry Muhammad Nadeem [2 ]
Ahmad, Mudassar [2 ]
Jabbar, Sohail [2 ]
机构
[1] Univ Engn & Technol Lahore, Dept Comp Sci, Faisalabad Campus, Faisalabad, Pakistan
[2] Natl Text Univ, Dept Comp Sci, Faisalabad, Pakistan
[3] Univ Punjab, Punjab Univ Coll Informat Technol, Lahore, Pakistan
来源
INFORMATION TECHNOLOGY AND CONTROL | 2019年 / 48卷 / 03期
关键词
Association Rules; Frequent Itemset Mining; Apriori; FP-Growth; Recommendation Systems; N-LIST; ALGORITHM; PATTERNS; CBAR;
D O I
10.5755/j01.itc.48.3.21137
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern algorithms for mining frequent itemsets face the noteworthy deterioration of performance when minimum support tends to decrease, especially for sparse datasets. Long-tailed itemsets, frequent itemsets found at lower minimum support, are significant for present-day applications such as recommender systems. In this study, a novel power set based method named as HARnessing the Power of Power sets (HARPP) for mining frequent itemsets is developed. HARPP is based on the concept of power set from set theory and incorporates efficient data structures for mining. Without storing it entirely in memory, HARPP scans the dataset only once and mines frequent itemsets on the fly. In contrast to state-of-the-art, the efficiency of HARPP increases with a decrease in minimum support that makes it a viable technique for mining long-tailed itemsets. A performance study shows that HARPP is efficient and scalable. It is faster up to two orders of magnitude than FP-Growth algorithm at lower minimum support, particularly when datasets are sparse. HARPP memory consumption is less than that of state-of-the-art by an order of magnitude, on most datasets.
引用
收藏
页码:415 / 431
页数:17
相关论文
共 64 条
  • [1] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [2] Agrawal R., P 20 INT C VERY LARG, DOI DOI 10.1055/S-2007-996789
  • [3] A Hybrid Book Recommender System Based on Table of Contents (ToC) and Association Rule Mining
    Ali, Zafar
    Khusro, Shah
    Ullah, Irfan
    [J]. INTERNATIONAL CONFERENCE ON INFORMATICS AND SYSTEMS (INFOS 2016), 2016, : 68 - 74
  • [4] Anand R., 2014, MINING MASSIVE DATAS, DOI [10.1017/CBO9781139924801, DOI 10.1017/CB09781139924801]
  • [5] [Anonymous], 2004, P 2004 ACM S APPL CO, DOI DOI 10.1145/967900.968012
  • [6] A Parallel MapReduce Algorithm to Efficiently Support Itemset Mining on High Dimensional Data
    Apiletti, Daniele
    Baralis, Elena
    Cerquitelli, Tania
    Garza, Paolo
    Pulvirenti, Fabio
    Michiardi, Pietro
    [J]. BIG DATA RESEARCH, 2017, 10 : 53 - 69
  • [7] Frequent Itemsets Mining for Big Data: A Comparative Analysis
    Apiletti, Daniele
    Baralis, Elena
    Cerquitelli, Tania
    Garza, Paolo
    Pulvirenti, Fabio
    Venturini, Luca
    [J]. BIG DATA RESEARCH, 2017, 9 : 67 - 83
  • [8] Asha T., 2011, Proceedings of the International Conference Workshop on Emerging Trends in Technology, P1327
  • [9] Scale-Free Networks: A Decade and Beyond
    Barabasi, Albert-Laszlo
    [J]. SCIENCE, 2009, 325 (5939) : 412 - 413
  • [10] A novel approach for mining maximal frequent patterns
    Bay Vo
    Sang Pham
    Tuong Le
    Deng, Zhi-Hong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 : 178 - 186