Bulk execution of the dynamic programming for the optimal polygon triangulation problem on the GPU

被引:3
作者
Yamashita, Kohei [1 ]
Ito, Yasuaki [1 ]
Nakano, Koji [1 ]
机构
[1] Hiroshima Univ, Dept Informat Engn, Kagamiyama 1-4-1, Higashihiroshima, Hiroshima, Japan
关键词
bulk execution; CUDA; dynamic programming; GPGPU; parallel algorithms; triangulation;
D O I
10.1002/cpe.4947
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The bulk execution is to execute some computation for many different inputs in turn or at the same time. The main contribution of this paper is to propose a parallel processing technique for the bulk execution of the dynamic programming using the GPU (Graphics Processing Unit). Especially, we focus on the optimal polygon triangulation problem for a lot of polygons. We consider programming issues of the GPU architecture such as coalesced memory access of the global memory, warp divergence avoidance, and reduction of CUDA kernel calls. In the GPU implementation, we propose two thread assignment methods that efficiently perform the parallel execution with a lot of threads on thousands of cores in the GPU. The experimental results show that our GPU implementation on NVIDIA TITAN V attains a speed-up factor of up to 106.05 and 26.78 over the single-thread and 8-thread CPU implementations on Intel Core i7-6700K CPU, respectively.
引用
收藏
页数:15
相关论文
共 34 条
[1]  
[Anonymous], 2011, Cuda c/c++ streams and concurrency
[2]  
Bergroth L., 2000, P 7 INT S STRING PRO
[3]   A Dynamic Approach for Workload Partitioning on GPU Architectures [J].
Busato, Federico ;
Bombieri, Nicola .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) :1535-1549
[4]  
Cormen T. H., 1990, Introduction to Algorithms
[5]  
Garg R.P., 2001, Techniques for Optimizing Applications: High Performance Computing
[6]   Performance models for asynchronous data transfers on consumer Graphics Processing Units [J].
Gomez-Luna, Juan ;
Maria Gonzalez-Linares, Jose ;
Ignacio Benavides, Jose ;
Guil, Nicolas .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2012, 72 (09) :1117-1126
[7]   A Warp-synchronous Implementation for Multiple-length Multiplication on the GPU [J].
Honda, Takumi ;
Ito, Yasuaki ;
Nakano, Koji .
PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, :96-102
[8]   PARALLEL DYNAMIC-PROGRAMMING [J].
HUANG, SHS ;
LIU, HF ;
VISWANATHAN, V .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1994, 5 (03) :326-328
[9]  
Hwu W.-M.W., 2011, GPU Computing Gems: Emerald Edition
[10]   A GPU Implementation of Dynamic Programming for the Optimal Polygon Triangulation [J].
Ito, Yasuaki ;
Nakano, Koji .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (12) :2596-2603