Improving matrix-based dynamic programming on massively parallel accelerators

被引:6
作者
Bednarek, David [1 ]
Brabec, Michal [1 ]
Krulis, Martin [1 ]
机构
[1] Charles Univ Prague, Fac Math & Phys, Parallel Architectures Algorithms Applicat Res Gr, Malostranske Nam 25, Prague, Czech Republic
关键词
Parallel; Multicore; GPU; Intel Xeon Phi; Dynamic programming; Edit distance; Dynamic time warping; ALGORITHM; SEARCH;
D O I
10.1016/j.is.2016.06.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic programming techniques are well-established and employed by-various practical algorithms, including the edit-distance algorithm or the dynamic time warping algorithm. These algorithms usually operate in an iteration-based manner where new values are computed from values of the previous iteration. The data dependencies enforce synchronization which limits possibilities for internal parallel processing. In this paper, we investigate parallel approaches to processing matrix-based dynamic programming algorithms on modern multicore CPUs, Intel Xeon Phi accelerators, and general purpose GPUs. We address both the problem of computing a single distance on large inputs and the problem of computing a number of distances of smaller inputs simultaneously (e.g., when a similarity query is being resolved). Our proposed solutions yielded significant improvements in performance and achieved speedup of two orders of magnitude when compared to the serial baseline. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:175 / 193
页数:19
相关论文
共 26 条
[1]  
Bednarek D., PARALLEL EVALUATION
[2]   Thread-cooperative, Bit-parallel Computation of Levenshtein Distance on GPU [J].
Chacon, Alejandro ;
Marco-Sola, Santiago ;
Espinosa, Antonio ;
Ribeca, Paolo ;
Carlos Moure, Juan .
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, :103-112
[3]  
Delgado G., 2011, Proceedings of the Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE 2011), P234, DOI 10.1109/JCSSE.2011.5930126
[4]   Striped Smith-Waterman speeds database searches six times over other SIMD implementations [J].
Farrar, Michael .
BIOINFORMATICS, 2007, 23 (02) :156-161
[5]   Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors [J].
Khajeh-Saeed, Ali ;
Poole, Stephen ;
Perot, J. Blair .
JOURNAL OF COMPUTATIONAL PHYSICS, 2010, 229 (11) :4247-4258
[6]   Improving Parallel Processing of Matrix-Based Similarity Measures on Modern GPUs [J].
Krulis, Martin ;
Bednarek, David ;
Brabec, Michal .
SIMILARITY SEARCH AND APPLICATIONS, SISAP 2015, 2015, 9371 :283-294
[7]  
Levenshtein V.I., 1966, Soviet Physics Doklady
[8]   AN EFFICIENT IMPLEMENTATION OF SMITH WATERMAN ALGORITHM ON GPU USING CUDA, FOR MASSIVELY PARALLEL SCANNING OF SEQUENCE DATABASES [J].
Ligowski, Lukasz ;
Rudnicki, Witold .
2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, :1602-+
[9]  
Liu Y, 2006, LECT NOTES COMPUT SC, V3994, P188
[10]   CUDASW++3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions [J].
Liu, Yongchao ;
Wirawan, Adrianto ;
Schmidt, Bertil .
BMC BIOINFORMATICS, 2013, 14 :117