Pattern mining for large distributed dataset: A parallel approach (PMLDD)

被引:1
作者
Pal, Amrit [1 ]
Kumar, Manish [1 ]
机构
[1] Indian Inst Informat Technol, Dept Informat Technol, Allahabad, Uttar Pradesh, India
关键词
Association rule; Frequent pattern; HDFS; Large datasets; MapReduce; ASSOCIATION RULES; TREE; MAPREDUCE;
D O I
10.3837/tiis.2018.11.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Handling vast amount of data found in large transactional datasets is an obvious challenge for the conventional data mining algorithms. Addressing this challenge, our paper proposes a parallel approach for proper decomposition of mining problem into sub-problems in order to find frequent patterns from these datasets. The proposed, Pattern Mining for Large Distributed Dataset (PMLDD) approach, ensures minimum dependencies as well as minimum communications among sub-problems. It establishes a linear aggregation of the intermediate results so that it can be adapted to large-scale programming models like MapReduce. In this context, an algorithmic structure for MapReduce programming model is presented. PMLDD guarantees an efficient load balancing among the sub-problems by a specific selection criterion. Further, it optimizes the number of required iterations over the dataset for mining frequent patterns as compared to the existing approaches. Finally, we believe that our approach is scalable enough to handle larger datasets in terms of performance evaluation, and the result analysis justifies all these mentioned concerns.
引用
收藏
页码:5287 / 5303
页数:17
相关论文
共 36 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]   A Novel Incremental Data Mining Algorithm based on FP-Growth for Big Data [J].
Chang, Hong-Yi ;
Lin, Jia-Chi ;
Cheng, Mei-Li ;
Huang, Shih-Chang .
PROCEEDINGS 2016 INTERNATIONAL CONFERENCE ON NETWORKING AND NETWORK APPLICATIONS NANA 2016, 2016, :375-378
[3]   A Hybrid Algorithm for Frequent Pattern Mining Using MapReduce Framework [J].
Chang, Hong-Yi ;
Tzang, Yih-Jou ;
Lin, Jia-Chi ;
Hong, Zih-Huan ;
Chi, Ting-Yun ;
Huang, Chun-Yen .
2015 FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE THEORY, SYSTEMS AND APPLICATIONS (CCITSA 2015), 2015, :19-22
[4]  
Chang XZ, 2015, INT CONF MACH LEARN, P637, DOI 10.1109/ICMLC.2015.7340629
[5]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[6]   A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules [J].
del Rio, Sara ;
Lopez, Victoria ;
Manuel Benitez, Jose ;
Herrera, Francisco .
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2015, 8 (03) :422-437
[7]   Improvement and research of FP-growth algorithm based on distributed spark [J].
Deng Lingling ;
Lou Yuansheng .
2015 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2015, :105-108
[8]  
Fan W., 2013, ACM sIGKDD Explorations Newsletter, V14, P1
[9]  
Farzanyar Z, 2013, 2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), P1183
[10]  
Fournier-Viger P, 2014, J MACH LEARN RES, V15, P3389