High Performance Parallelization of Boyer-Moore Algorithm on Many-Core Accelerators

被引：0

作者：

Jeong, Yosang ^{[1
]}

Lee, Myungho ^{[1
]}

Nam, Dukyun ^{[2
]}

Kim, Jik-Soo ^{[2
]}

Hwang, Soonwook ^{[2
]}

机构：

[1] Myongji Unviers, Dept Comp Sci & Engn, 116 Myongji Ro, Yongin, Kyung Ki Do, South Korea

[2] Korea Inst Sci & Technol Informat, Supercomp R&D Ctr, Daejeon, South Korea

来源：

2014 INTERNATIONAL CONFERENCE ON CLOUD AND AUTONOMIC COMPUTING (ICCAC 2014) | 2014年

关键词：

Boyer-Moore algorithm; many-core accelerator; parallelization; dynamic scheduling; multithreading; algorithmic cascading;

D O I：

10.1109/ICCAC.2014.20

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Boyer-Moore (BM) algorithm is a single pattern string matching algorithm. It is considered as the most efficient string matching algorithm and used in many applications. The algorithm first calculates two string shift rules based on the given pattern string in the preprocessing phase. These rules help skip parts of the target input string where there is no match to be found. Using the two shift rules, pattern matching operations are performed against the target input sting in the second phase. The second phase is a time consuming process and needs to be parallelized to achieve the high performance string matching. In this paper, we parallelize the BM algorithm on the latest many-core accelerators such as the Intel Xeon Phi and the Nvidia Tesla K20 GPU, along with the general-purpose multi-core processors. We partition the target input data amongst multiple threads for parallel execution. Data lying on the threads' boundaries need to be copied redundantly so that the pattern string lying on the boundary can be found. As the target length increases, the algorithm incurs increased matching operations. Also, as the pattern length increases, the number of possible matches decreases. This can potentially lead to the unbalanced workload distribution among threads. Furthermore, the redundant data copy significantly overloads the on-chip shared memories of the GPU for a large number of threads. We use the dynamic scheduling and the multithreading techniques to solve the load balancing problem. We also use the algorithmic cascading technique to reduce the burden on the shared memories of the GPU. Our parallel implementation leads to similar to 17-times speedup on the Xeon Phi and similar to 45-times speedup on the Nvidia Tesla K20 GPU compared with a serial implementation on the host Intel Xeon processor.

引用

页码：265 / 272

页数：8

共 50 条

[1] High performance parallelization of Boyer-Moore algorithm on many-core accelerators
Jeong, Yosang
Lee, Myungho
Nam, Dukyun
Kim, Jik-Soo
Hwang, Soonwook
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (03): : 1087 - 1098
[2] High performance parallelization of Boyer–Moore algorithm on many-core accelerators
Yosang Jeong
Myungho Lee
Dukyun Nam
Jik-Soo Kim
Soonwook Hwang
Cluster Computing, 2015, 18 : 1087 - 1098
[3] A VARIATION ON THE BOYER-MOORE ALGORITHM
LECROQ, T
THEORETICAL COMPUTER SCIENCE, 1992, 92 (01) : 119 - 144
[4] ON THE EXPECTED SUBLINEARITY OF THE BOYER-MOORE ALGORITHM
SCHABACK, R
SIAM JOURNAL ON COMPUTING, 1988, 17 (04) : 648 - 658
[5] A Compiler for High Performance Computing With Many-Core Accelerators
Nakasato, Naohito
Makino, Jun
2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, 2009, : 629 - +
[6] Average case analysis of the Boyer-Moore algorithm
Tsai, Tsung-Hsi
RANDOM STRUCTURES & ALGORITHMS, 2006, 28 (04) : 481 - 498
[7] Fingerprint Identification using Bozorth and Boyer-Moore Algorithm
Supatmi, S.
Sumitra, I. D.
2ND INTERNATIONAL CONFERENCE ON INFORMATICS, ENGINEERING, SCIENCE, AND TECHNOLOGY (INCITEST 2019), 2019, 662
[8] A Boyer-Moore type algorithm for compressed pattern matching
Shibata, Y
Matsumoto, T
Takeda, M
Shinohara, A
Arikawa, S
COMBINATORIAL PATTERN MATCHING, 2000, 1848 : 181 - 194
[9] A Boyer-Moore Type Algorithm for Timed Pattern Matching
Waga, Masaki
Akazaki, Takumi
Hasuo, Ichiro
FORMAL MODELING AND ANALYSIS OF TIMED SYSTEMS, FORMATS 2016, 2016, 9884 : 121 - 139
[10] Performance evaluation of the Hermite scheme on many-core accelerators
Nakasato, Naohito
STAR CLUSTERS AND BLACK HOLES IN GALAXIES ACROSS COSMIC TIME, 2016, 10 (312): : 254 - 257

← 1 2 3 4 5 →