A Parallel Dynamic Programming Algorithm on a Multi-core Architecture

被引:0
作者
Tan, Guangming [1 ]
Sun, Ninghui [1 ]
Gao, Guang R.
机构
[1] Chinese Acad Sci, Key Lab Comp Syst & Architecture, Beijing, Peoples R China
来源
SPAA'07: PROCEEDINGS OF THE NINETEENTH ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES | 2007年
基金
美国国家科学基金会;
关键词
Dynamic Programming; Data Dependence; Multicore; Memory Hierarchy; Scalabilitiy;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programming algorithms. The purpose of this paper is to study a family of dynamic programming algorithm where data dependence appear between non-consecutive stages, in other words, the data dependence is non-uniform. This kind of dynamic programming is typically called nonserial polyadic dynamic programming. Owing to the: non-uniform data dependence; it is harder to optimize this problem for parallelism and locality on parallel architectures. In this paper, we address the chanllenge of exploiting fine grain parallelism and locality of nonserial polyadic dynamic programming on a multi-core architecture. We present a programming and execution model for multi-core architectures with memory hierarchy. In the framework of the new model, the parallelism and locality benifit from a data dependence transformation. We propose a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators. The new parallel algorithm tolerates the memory access latency using multi-thread and is easily improved with the technique. We formulate and analytically solve the optimization problem determing the the size that minimizes the total execution time. The experiments on a simulator give a validation of the proposed model and show that the fine grain parallel algorithm achieves sub-linear speedup and that a potential high scalability on multi-core arichitecture.
引用
收藏
页码:135 / +
页数:3
相关论文
共 32 条
[1]   THE INPUT OUTPUT COMPLEXITY OF SORTING AND RELATED PROBLEMS [J].
AGGARWAL, A ;
VITTER, JS .
COMMUNICATIONS OF THE ACM, 1988, 31 (09) :1116-1127
[2]  
ALMEIDA F, 2002, ACM S PAR ARCH ALG, P173
[3]  
[Anonymous], P 5 ACM SIGPLAN S PR
[4]  
[Anonymous], 5 WORKSH MASS PAR PR
[5]  
BADER D, 2005, 12 INT C HIGH PERF C, P465
[6]  
BRADFORD PG, 1992, CONTROL COMPUTING, P185
[7]   Optimization of an RNA folding algorithm for parallel architectures [J].
Chen, JH ;
Le, SY ;
Shapiro, BA ;
Maizel, JV .
PARALLEL COMPUTING, 1998, 24 (11) :1617-1634
[8]  
COLEMAN S, 1995, P ACM SIGPLAN C PROG
[9]  
CUVILLO J, 2005, WORKSH MOD BENCHM SI
[10]  
CUVILLO J, 2006, 20 INT S HIGH PERF C