Mining Distributed Frequent Itemsets Using a Gossip Based Protocol

被引:0
作者
Bagheri, Maryam [1 ]
Mirian-Hosseinabadi, Seyed-Hassan [1 ]
Mashayekhi, Hoda [1 ]
Habibi, Jafar [1 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran
来源
2012 9TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INTELLIGENCE & COMPUTING AND 9TH INTERNATIONAL CONFERENCE ON AUTONOMIC & TRUSTED COMPUTING (UIC/ATC) | 2012年
关键词
data mining; frequent itemset mining; gossip; trie;
D O I
10.1109/UIC-ATC.2012.136
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, there has been a growing attention in frequent item set mining in distributed systems. In this paper, we present an algorithm to extract frequent item sets from large distributed datasets. Our algorithm uses gossip as the communication mechanism and does not rely on any central node. In gossip based communication, nodes repeatedly select other random nodes in the system, and exchange information with them. Our algorithm proceeds in rounds and provides all nodes with the required support counts of item sets, such that each node is able to extract the global frequent item sets. For local iteration and generation of candidate item sets, a trie data structure is used, which facilitates the process and reduces execution time. We further propose an improvement to our algorithm by grouping nodes and arranging them into a hierarchical structure. By performing aggregation tasks in groups, communication overhead is effectively reduced. We evaluate our proposal using simulation, and show advantages of our algorithms in reducing execution time and communication overhead, while preserving accuracy.
引用
收藏
页码:780 / 785
页数:6
相关论文
共 17 条
  • [11] Gossip-based aggregation in large dynamic networks
    Jelasity, M
    Montresor, A
    Babaoglu, O
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2005, 23 (03): : 219 - 252
  • [12] Kashyap S., 2006, Proceedings of the Symposium on Principles of Database Systems (PODS), P308
  • [13] Gossip-based computation of aggregate information
    Kempe, D
    Dobra, A
    Gehrke, J
    [J]. 44TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2003, : 482 - 491
  • [14] Identifying frequent items in a network using gossip
    Lahiri, Bibudh
    Tirthapura, Srikanta
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2010, 70 (12) : 1241 - 1253
  • [15] FINDING REPEATED ELEMENTS
    MISRA, J
    GRIES, D
    [J]. SCIENCE OF COMPUTER PROGRAMMING, 1982, 2 (02) : 143 - 152
  • [16] A high-performance distributed algorithm for mining association rules
    Schuster, A
    Wolff, R
    Trock, D
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 7 (04) : 458 - 475
  • [17] Parallel and Distributed Frequent Pattern Mining in Large Databases
    Tanbeer, Syed Khairuzzaman
    Ahmed, Chowdhury Farhan
    Jeong, Byeong-Soo
    [J]. HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 407 - 414