FSMBUS: A frequent subgraph mining algorithm in single large-scale graph using spark

被引：0

作者：

Yan, Yuliang ^{[1
]}

Dong, Yihong ^{[1
]}

He, Xianmang ^{[1
]}

Wang, Wei ^{[2
]}

机构：

[1] Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, 315211, Zhejiang

[2] School of Computer Science, Fudan University, Shanghai

来源：

Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2015年 / 52卷 / 08期

关键词：

Distribute mining; Frequent subgraph; Single large-scale graph; Spark; Workload balance;

D O I：

10.7544/issn1000-1239.2015.20150256

中图分类号：

学科分类号：

摘要：

Mining frequent subgraphs in a single large-scale graph is of huge demand with the rapid growth of the social networking. However, it is inefficient for the serial algorithms to mine frequent subgraphs in low support when mining for a single large-scale graph. Meanwhile, few existing distributed algorithms can't support the growth pattern mining, and the Hadoop framework they worked is not suitable for iterative running. In this paper, a distributed algorithm named FSMBUS for mining frequent subgraph in a single large-scale graph under Spark framework is proposed. It constructs the parallel computing candidate subgraphs by suboptimal CAM Tree, which returns all the frequent subgraphs for given user-defined minimum support. Additionally, infrequent patterns' test and searching order chosen is introduced to optimize the algorithm. Sorted-Greedy method is designed for data partition to balance the workload. Our experiments show that FSMBUS runs faster and more effective than the existing algorithms with real datasets, and even can run with the lower support threshold and the larger graph datasets as well. At the same time, FSMBUS runs 2~4 times faster on Spark framework than that on Hadoop framework. ©, 2015, Science Press. All right reserved.

引用

页码：1768 / 1783

页数：15

共 28 条

[1] Borgelt C., Berthold M.R., Patterson D.E., Molecular fragment mining for drug discovery [G], Symbolic and Quantitative Approaches to Reasoning with Uncertainty, pp. 1002-1013, (2005)
[2] Wang G., Yin J., Zhan W., A novel graph classification approach based on embedding sets, Journal of Computer Research and Development, 49, 11, pp. 2311-2319, (2012)
[3] Guralnik V., Karypis G., A scalable algorithm for clustering sequential data, Proc of the 1st IEEE Int Conf on Data Mining, pp. 179-186, (2001)
[4] Yan X., Yu P.S., Han J., Graph indexing: A frequent structure-based approach, Proc of the 17th ACM SIGMOD Int Conf on Management of Data, pp. 335-346, (2004)
[5] Liu Y., Jiang X., Chen H., Et al., Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network [G], Advanced Parallel Processing Technologies, pp. 341-355, (2009)
[6] Shahrivari S., Jalili S., Distributed discovery of frequent subgraphs of a network using MapReduce, (2015)
[7] Elseidy M., Abdelhamid E., Skiadopoulos S., Et al., GRAMI: Frequent subgraph and pattern mining in a single large graph, Proc of the 40th Int Conf on Very Large Data Bases, pp. 517-528, (2014)
[8] Bhuiyan M.A., Al Hasan M., An iterative MapReduce based frequent subgraph mining algorithm, IEEE Trans on Knowledge and Data Engineering, 27, 3, pp. 608-620, (2013)
[9] Lu W., Chen G., Tung A.K.H., Et al., Efficiently extracting frequent subgraphs using mapreduce, Proc of the 1st IEEE Int Conf on Big Data, pp. 639-647, (2013)
[10] Dean J., Ghemawat S., MapReduce: Simplified data processing on large clusters, Communications of the ACM, 51, 1, pp. 107-113, (2008)

← 1 2 3 →