Performance Comparison of Distributed Pattern Matching Algorithms on Hadoop MapReduce Framework

被引:0
|
作者
Sona, C. P. [1 ]
Mulerikkal, Jaison Paul [1 ]
机构
[1] Rajagiri Sch Engn & Technol, Sunya Labs, Kochi, Kerala, India
来源
MOBILE NETWORKS AND MANAGEMENT (MONAMI 2017) | 2018年 / 235卷
关键词
Pattern matching; Hadoop; MapReduce; Big Data; Knuth Morris Pratt Algorithm; Boyer Moore Algorithm; Franek Jennings Smyth Algorithm;
D O I
10.1007/978-3-319-90775-8_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Creating meaning out of the growing Big Data is an insurmountable challenge data scientists face and pattern matching algorithms are great means to create such meaning from heaps of data. However, the available pattern matching algorithms are mostly tested with linear programming models whose adaptability and efficiency are not tested in distributed programming models such as Hadoop Map-Reduce, which supports Big Data. This paper explains an experience of parallelizing three of such pattern matching algorithms, namely - Knuth Morris Pratt Algorithm (KMP), Boyer Moore Algorithm (BM) and a lesser known Franek Jennings Smyth (FJS) Algorithm and porting them to Hadoop MapReduce framework. All the three algorithms are converted to MapReduce programs using key value pairs and experimented on single node as well as cluster Hadoop environment. The result analysis with the Project Gutenberg data-set has shown all the three parallel algorithms scale well on Hadoop environment as the data size increases. The experimental results prove that KMP algorithm gives higher performance for shorter patterns over BM, and BM algorithm gives higher performance than KMP for longer patterns. However, FJS algorithm, which is a hybrid of KMP and Boyer horspool algorithm which is advanced version of BM, outperforms both KMP and BM for shorter and longer patterns, and emerges as the most suitable algorithm for pattern matching in a Hadoop environment.
引用
收藏
页码:45 / 55
页数:11
相关论文
共 50 条
  • [21] The Performance Evaluation of a Distributed Image Classification Pipeline Based on Hadoop and MapReduce with Initial Application to Medical Images
    Guo, Shujian
    Zhang, Yaonan
    Wu, Qiushi
    Niu, Lechuan
    Zhang, Wenwei
    Li, Songbai
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2018, 8 (01) : 78 - 83
  • [22] Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster
    Naik, Nenavath Srinivas
    Negi, Atul
    Sastry, V. N.
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS, ICACNI 2015, VOL 2, 2016, 44 : 465 - 473
  • [23] A Hadoop MapReduce Performance Prediction Method
    Song, Ge
    Meng, Zide
    Huet, Fabrice
    Magoules, Frederic
    Yu, Lei
    Lin, Xuelian
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 820 - 825
  • [24] Optimisation of Hadoop MapReduce Configuration Parameter Settings Using Genetic Algorithms
    Khaleel, Ali
    Al-Raweshidy, H. S.
    INTELLIGENT COMPUTING, VOL 2, 2019, 857 : 40 - 52
  • [25] HOG: Distributed Hadoop MapReduce on the Grid
    He, Chen
    Weitzel, Derek
    Swanson, David
    Lu, Ying
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1276 - 1283
  • [26] Performance Evaluation and Tuning for MapReduce Computing in Hadoop Distributed File System
    Kim, Jongyeop
    Kumar, Ashwin T. K.
    George, K. M.
    Park, Nohpill
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2015, : 62 - 68
  • [27] SmartGrids: MapReduce Framework using Hadoop
    Fanibhare, Vaibhav
    Dahake, Vijay
    2016 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2016, : 406 - 411
  • [28] IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop
    Kavitha, C.
    Srividhya, S. R.
    Lai, Wen-Cheng
    Mani, Vinodhini
    ELECTRONICS, 2022, 11 (10)
  • [29] Performance Modelling and Analysis of MapReduce/Hadoop Workloads
    Yu, Xiaolong
    Li, Wei
    2015 IEEE 21ST INTERNATIONAL WORKSHOP ON LOCAL & METROPOLITAN AREA NETWORKS (LANMAN), 2015,
  • [30] Performance analysis of MapReduce Programs on Hadoop cluster
    Maurya, Mahesh
    Mahajan, Sunita
    PROCEEDINGS OF THE 2012 WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2012, : 505 - 510