Optimizing Dynamic Programming on Graphics Processing Units via Adaptive Thread-Level Parallelism

被引:9
|
作者
Wu, Chao-Chin [1 ]
Ke, Jenn-Yang [2 ]
Lin, Heshan [3 ]
Feng, Wu-chun [3 ]
机构
[1] Natl Changhua Univ Educ, Dept Comp Sci & Informat Engn, Changhua 500, Taiwan
[2] Tatung Univ, Dept Math Appl, Taipei 104, Taiwan
[3] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24060 USA
来源
2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) | 2011年
关键词
dynamic programming; GPU; parallel computing; parallelism; optimization; CUDA;
D O I
10.1109/ICPADS.2011.92
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Dynamic programming (DP) is an important computational method for solving a wide variety of discrete optimization problems such as scheduling, string editing, packaging, and inventory management. In general, DP is classified into four categories based on the characteristics of the optimization equation. Because applications that are classified in the same category of DP have similar program behavior, the research community has sought to propose general solutions for parallelizing each category of DP. However, most existing studies focus on running DP on CPU-based parallel systems rather than on accelerating DP algorithms on the graphics processing unit (GPU). This paper presents the GPU acceleration of an important category of DP problems called nonserial polyadic dynamic programming (NPDP). In NPDP applications, the degree of parallelism varies significantly in different stages of computation, making it difficult to fully utilize the compute power of hundreds of processing cores in a GPU. To address this challenge, we propose a methodology that can adaptively adjust the thread-level parallelism in mapping a NPDP problem onto the GPU, thus providing sufficient and steady degrees of parallelism across different compute stages. We realize our approach in a real-world NPDP application - the optimal matrix parenthesization problem. Experimental results demonstrate our method can achieve a speedup of 13.40 over the previously published GPU algorithm.
引用
收藏
页码:96 / 103
页数:8
相关论文
共 44 条
  • [1] Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch with Inter-Block Barrier Synchronization
    Wu, Chao-Chin
    Wei, Kai-Cheng
    Lin, Ting-Hong
    PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 45 - 52
  • [2] Adjusting Thread Parallelism Dynamically to Accelerate Dynamic Programming with Irregular Workload Distribution on GPGPUs
    Wu, Chao-Chin
    Ke, Jenn-Yang
    Lin, Heshan
    Jhan, Syun-Sheng
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2014, 6 (01) : 1 - 20
  • [3] Virtual Thread: Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit
    Yoon, Myung Kuk
    Kim, Keunsoo
    Lee, Sangpil
    Ro, Won Woo
    Annavaram, Murali
    2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 609 - 621
  • [4] Parallelization Spectroscopy: Analysis of Thread-level Parallelism in HPC Programs
    Kejariwal, Arun
    Cascaval, Calin
    ACM SIGPLAN NOTICES, 2009, 44 (04) : 293 - 294
  • [5] An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness
    Hong, Sunpyo
    Kim, Hyesoon
    ISCA 2009: 36TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 2009, : 152 - 163
  • [6] Application of thread-level parallel programming to thermohydrodynamic lubrication computation
    Wang, Nenzi
    Tsai, Chih-Ming
    TRIBOLOGY TRANSACTIONS, 2006, 49 (04) : 473 - 481
  • [7] CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications
    Yang, Yi
    Li, Chao
    Zhou, Huiyang
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (01) : 3 - 19
  • [8] Improving Thread-level Parallelism in GPUs Through Expanding Register File to Scratchpad Memory
    Yu, Chao
    Bai, Yuebin
    Sun, Qingxiao
    Yang, Hailong
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 15 (04)
  • [9] GODSON-T: AN EFFICIENT MANY-CORE PROCESSOR EXPLORING THREAD-LEVEL PARALLELISM
    Fan, Dongrui
    Zhang, Hao
    Wang, Da
    Ye, Xiaochun
    Song, Fenglong
    Li, Guojie
    Sun, Ninghui
    IEEE MICRO, 2012, 32 (02) : 38 - 47
  • [10] Improving ODE Integration on Graphics Processing Units by Reducing Thread Divergence
    Kovac, Thomas
    Haber, Tom
    Van Reeth, Frank
    Hens, Niel
    COMPUTATIONAL SCIENCE - ICCS 2019, PT III, 2019, 11538 : 450 - 456