Research and Improvement of Parallelization of FP - Growth Algorithm Based on Spark

被引:0
作者
Zhang, Fan [1 ]
Xiao, Youan
Long, Yihong
机构
[1] Wuhan Univ Technol, Sch Informat Engn, Wuhan, Hubei, Peoples R China
来源
PROCEEDINGS OF 2017 IEEE 7TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC) | 2017年
关键词
FP-Growth; DGFP; Spark; load balancing; dynamic grouping;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Aiming at the problem of load imbalancing when the existing FP-Growth algorithm groups the data set, an optimization algorithm DGFP-Growth for dynamic dividing data set is proposed First, before the dynamic grouping, the load estimation method is proposed The position of each item in the frequent item list and the size of its support are used to determine the load weight. Then under the grouping strategy, these items are assigned to the corresponding group according to their load weights to ensure that the load is balanced within each group. Experiments show that the optimization algorithm proposed in this paper can effectively solve the problem of parallel load imbalancing, and improve the overall efficiency of the cluster between 5% and 15%.
引用
收藏
页码:145 / 148
页数:4
相关论文
共 9 条
  • [1] [Anonymous], 2006, GESTS International Transactions on Computer Science and Engineering
  • [2] Cui Guan-xun, 2010, Journal of Computer Applications, V30, P2952, DOI 10.3724/SP.J.1087.2010.02952
  • [3] Dang Minxia, 2012, COMMAND INFORM SYSTE, V04, P73
  • [4] Deng Lingling, 2016, MICROCOMPUTER APPL
  • [5] Fang xiang, 2016, MODERN ELECT TECHNIQ, V08, P9
  • [6] Han JW, 2000, SIGMOD RECORD, V29, P1
  • [7] Li HY, 2008, RECSYS'08: PROCEEDINGS OF THE 2008 ACM CONFERENCE ON RECOMMENDER SYSTEMS, P107
  • [8] Lin yuhan, 2013, CHINESE J COMPUTERS, V02, P384
  • [9] 基于负载均衡和冗余剪枝的并行FP-Growth算法
    刘祥哲
    刘培玉
    任敏
    伊静
    高钊
    [J]. 数据采集与处理, 2016, 31 (01) : 223 - 230