High Performance Parallelization of Boyer-Moore Algorithm on Many-Core Accelerators

被引:0
|
作者
Jeong, Yosang [1 ]
Lee, Myungho [1 ]
Nam, Dukyun [2 ]
Kim, Jik-Soo [2 ]
Hwang, Soonwook [2 ]
机构
[1] Myongji Unviers, Dept Comp Sci & Engn, 116 Myongji Ro, Yongin, Kyung Ki Do, South Korea
[2] Korea Inst Sci & Technol Informat, Supercomp R&D Ctr, Daejeon, South Korea
关键词
Boyer-Moore algorithm; many-core accelerator; parallelization; dynamic scheduling; multithreading; algorithmic cascading;
D O I
10.1109/ICCAC.2014.20
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Boyer-Moore (BM) algorithm is a single pattern string matching algorithm. It is considered as the most efficient string matching algorithm and used in many applications. The algorithm first calculates two string shift rules based on the given pattern string in the preprocessing phase. These rules help skip parts of the target input string where there is no match to be found. Using the two shift rules, pattern matching operations are performed against the target input sting in the second phase. The second phase is a time consuming process and needs to be parallelized to achieve the high performance string matching. In this paper, we parallelize the BM algorithm on the latest many-core accelerators such as the Intel Xeon Phi and the Nvidia Tesla K20 GPU, along with the general-purpose multi-core processors. We partition the target input data amongst multiple threads for parallel execution. Data lying on the threads' boundaries need to be copied redundantly so that the pattern string lying on the boundary can be found. As the target length increases, the algorithm incurs increased matching operations. Also, as the pattern length increases, the number of possible matches decreases. This can potentially lead to the unbalanced workload distribution among threads. Furthermore, the redundant data copy significantly overloads the on-chip shared memories of the GPU for a large number of threads. We use the dynamic scheduling and the multithreading techniques to solve the load balancing problem. We also use the algorithmic cascading technique to reduce the burden on the shared memories of the GPU. Our parallel implementation leads to similar to 17-times speedup on the Xeon Phi and similar to 45-times speedup on the Nvidia Tesla K20 GPU compared with a serial implementation on the host Intel Xeon processor.
引用
收藏
页码:265 / 272
页数:8
相关论文
共 50 条
  • [41] Parallelization and sustainability of distributed genetic algorithms on many-core processors
    Sato, Yuji
    Sato, Mikiko
    INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2014, 7 (01) : 2 - 23
  • [42] Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters
    Al-hayanni, Mohammed A. N.
    Shafik, Rishad
    Rafiev, Ashur
    Xia, Fei
    Yakovlev, Alex
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 410 - 417
  • [43] Design of Direct Communication Facility for Many-Core based Accelerators
    Si, Min
    Ishikawa, Yutaka
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 924 - 929
  • [44] A New Method to Obtain the shift-table in Boyer-Moore's String Matching Algorithm
    Wang, Yang
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3282 - 3285
  • [45] Application Deployment Strategies for Spatial Isolation on Many-Core Accelerators
    Real, Maria Mendez
    Wehner, Philipp
    Lapotre, Vianney
    Goehringer, Diana
    Gogniat, Guy
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2018, 17 (02)
  • [46] Performance/energy aware task migration algorithm for many-core chips
    Afsharpour, Sima
    Patooghy, Ahmad
    Fazeli, Mahdi
    IET COMPUTERS AND DIGITAL TECHNIQUES, 2016, 10 (04): : 165 - 173
  • [47] Parallelization and fault-tolerance of evolutionary computation on many-core processors
    Sato, Yuji
    Sato, Mikiko
    2013 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2013, : 2602 - 2609
  • [48] A Nanophotonic Interconnect for High-Performance Many-Core Computation
    Beausoleil, R. G.
    Fiorentino, M.
    Ahn, J.
    Binkert, N.
    Davis, A.
    Fattal, D.
    Jouppi, N. P.
    McLaren, M.
    Santori, C. M.
    Schreiber, R. S.
    Spillane, S. M.
    Vantrease, D.
    Xu, Q.
    2008 5TH IEEE INTERNATIONAL CONFERENCE ON GROUP IV PHOTONICS, 2008, : 365 - 367
  • [49] A nanophotonic interconnect for high-performance many-core computation
    Beausoleil, R. G.
    Ahn, J.
    Binkert, N.
    Davis, A.
    Fattal, D.
    Fiorentino, M.
    Jouppi, N. P.
    McLaren, M.
    Santori, C. M.
    Schreiber, R. S.
    Spillane, S. M.
    Vantrease, D.
    Xu, Q.
    16TH ANNUAL IEEE SYMPOSIUM ON HIGH-PERFORMANCE INTERCONNECTS, PROCEEDINGS, 2008, : 182 - 189
  • [50] A Compressive Sensing Algorithm for Many-Core Architectures
    Borghi, A.
    Darbon, J.
    Peyronnet, S.
    Chan, T. F.
    Osher, S.
    ADVANCES IN VISUAL COMPUTING, PT II, 2010, 6454 : 678 - 686