Performance Comparison of Distributed Pattern Matching Algorithms on Hadoop MapReduce Framework

被引:0
|
作者
Sona, C. P. [1 ]
Mulerikkal, Jaison Paul [1 ]
机构
[1] Rajagiri Sch Engn & Technol, Sunya Labs, Kochi, Kerala, India
来源
MOBILE NETWORKS AND MANAGEMENT (MONAMI 2017) | 2018年 / 235卷
关键词
Pattern matching; Hadoop; MapReduce; Big Data; Knuth Morris Pratt Algorithm; Boyer Moore Algorithm; Franek Jennings Smyth Algorithm;
D O I
10.1007/978-3-319-90775-8_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Creating meaning out of the growing Big Data is an insurmountable challenge data scientists face and pattern matching algorithms are great means to create such meaning from heaps of data. However, the available pattern matching algorithms are mostly tested with linear programming models whose adaptability and efficiency are not tested in distributed programming models such as Hadoop Map-Reduce, which supports Big Data. This paper explains an experience of parallelizing three of such pattern matching algorithms, namely - Knuth Morris Pratt Algorithm (KMP), Boyer Moore Algorithm (BM) and a lesser known Franek Jennings Smyth (FJS) Algorithm and porting them to Hadoop MapReduce framework. All the three algorithms are converted to MapReduce programs using key value pairs and experimented on single node as well as cluster Hadoop environment. The result analysis with the Project Gutenberg data-set has shown all the three parallel algorithms scale well on Hadoop environment as the data size increases. The experimental results prove that KMP algorithm gives higher performance for shorter patterns over BM, and BM algorithm gives higher performance than KMP for longer patterns. However, FJS algorithm, which is a hybrid of KMP and Boyer horspool algorithm which is advanced version of BM, outperforms both KMP and BM for shorter and longer patterns, and emerges as the most suitable algorithm for pattern matching in a Hadoop environment.
引用
收藏
页码:45 / 55
页数:11
相关论文
共 50 条
  • [31] Design and Implement a MapReduce Framework for Executing Standalone Software Packages in Hadoop-based Distributed Environmentsn
    Chen, Chao-Chun
    Hung, Min-Hsiung
    Giang, Nguyen Huu Tinh
    Lin, Hsuan-Chun
    Lin, Tzu-Chao
    SMART SCIENCE, 2013, 1 (02) : 99 - 107
  • [32] Conductor Temperature Estimation Using the Hadoop MapReduce Framework for Smart Grid Applications
    Pan, Sheng-Kai
    Jiang, Joe-Air
    Chen, Chia-Pang
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 1243 - 1247
  • [33] Modified MapReduce Framework for Enhancing Performance of Graph Based Algorithms by Fast Convergence in Distributed Environment
    Singhal, Hitesh
    Guddeti, Ram Mohana Reddy
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 1240 - 1245
  • [34] A Performance Analysis of MapReduce Applications on Big Data in Cloud based Hadoop
    Gohil, Parth
    Garg, Dweepna
    Panchal, Bakul
    2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [35] Enhancing Performance of Hadoop and Mapreduce for Scientific Data using NoSQL Database
    Alshammari, Hamoud
    Bajwa, Hassan
    Lee, Jeongkyu
    2015 IEEE LONG ISLAND SYSTEMS, APPLICATIONS AND TECHNOLOGY CONFERENCE (LISAT), 2015,
  • [36] Performance Analysis of Matrix and Graph Computations using Data Compression Techniques in MPI and Hadoop MapReduce in Big Data Framework
    Ramakrishnaiah, Nagendla
    Reddy, Sirigiri Konda
    2017 IEEE INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES AND MANAGEMENT FOR COMPUTING, COMMUNICATION, CONTROLS, ENERGY AND MATERIALS (ICSTM), 2017, : 54 - 62
  • [37] Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster
    Singh, Sudhakar
    Garg, Rakhi
    Mishra, P. K.
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 67 : 348 - 364
  • [38] Noninvasive MapReduce Performance Tuning Using Multiple Tuning Methods on Hadoop
    Chen, Donghua
    Zhang, Runtong
    Qiu, Robin Guanghua
    IEEE SYSTEMS JOURNAL, 2021, 15 (02): : 2906 - 2917
  • [39] Improving Hadoop MapReduce performance on heterogeneous single board computer clusters☆
    Lim, Sooyoung
    Park, Dongchul
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 752 - 766
  • [40] Algorithms for Iterative Applications in MapReduce Framework
    Reddy, A. Diwakar
    Reddy, J. Geetha
    INTERNATIONAL PROCEEDINGS ON ADVANCES IN SOFT COMPUTING, INTELLIGENT SYSTEMS AND APPLICATIONS, ASISA 2016, 2018, 628 : 51 - 61