Optimizing Dynamic Programming on Graphics Processing Units via Adaptive Thread-Level Parallelism

被引：9

作者：

Wu, Chao-Chin ^{[1
]}

Ke, Jenn-Yang ^{[2
]}

Lin, Heshan ^{[3
]}

Feng, Wu-chun ^{[3
]}

机构：

[1] Natl Changhua Univ Educ, Dept Comp Sci & Informat Engn, Changhua 500, Taiwan

[2] Tatung Univ, Dept Math Appl, Taipei 104, Taiwan

[3] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24060 USA

来源：

2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) | 2011年

关键词：

dynamic programming; GPU; parallel computing; parallelism; optimization; CUDA;

D O I：

10.1109/ICPADS.2011.92

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Dynamic programming (DP) is an important computational method for solving a wide variety of discrete optimization problems such as scheduling, string editing, packaging, and inventory management. In general, DP is classified into four categories based on the characteristics of the optimization equation. Because applications that are classified in the same category of DP have similar program behavior, the research community has sought to propose general solutions for parallelizing each category of DP. However, most existing studies focus on running DP on CPU-based parallel systems rather than on accelerating DP algorithms on the graphics processing unit (GPU). This paper presents the GPU acceleration of an important category of DP problems called nonserial polyadic dynamic programming (NPDP). In NPDP applications, the degree of parallelism varies significantly in different stages of computation, making it difficult to fully utilize the compute power of hundreds of processing cores in a GPU. To address this challenge, we propose a methodology that can adaptively adjust the thread-level parallelism in mapping a NPDP problem onto the GPU, thus providing sufficient and steady degrees of parallelism across different compute stages. We realize our approach in a real-world NPDP application - the optimal matrix parenthesization problem. Experimental results demonstrate our method can achieve a speedup of 13.40 over the previously published GPU algorithm.

引用

页码：96 / 103

页数：8

共 27 条

[1]

[Anonymous], 2006, Tech. rep.

[2] Dense Dynamic Programming on Multi GPU [J].

Boyer, Vincent ;

El Baz, Didier ;

Elkihel, Moussa .

PROCEEDINGS OF THE 19TH INTERNATIONAL EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2011, :545-551

[3]

Chin-Hong Sin, 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, P801, DOI 10.1109/ICCVW.2009.5457620

[4]

Deshmukh N., 2010, P 2 INT WORKSH HIGH

[5]

Dohi K., 2010, Proceedings of the 21st IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP 2010), P29, DOI 10.1109/ASAP.2010.5540796

[6]

Grama A., 2003, Introduction to Parallel Computing, V2

[7]

Hafeez M., 2007, INT J COMPUTERS, V1, P1

[8]

Harris M., 2007, GPU GEMS, V3

[9] Processor allocation and task scheduling of matrix chain products on parallel systems [J].

Lee, H. (heejo@ahnlab.com), 1600, Institute of Electrical and Electronics Engineers Computer Society (14) :394-407

[10] AN EFFICIENT IMPLEMENTATION OF SMITH WATERMAN ALGORITHM ON GPU USING CUDA, FOR MASSIVELY PARALLEL SCANNING OF SEQUENCE DATABASES [J].

Ligowski, Lukasz ;

Rudnicki, Witold .

2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, :1602-+

← 1 2 3 →