Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm

被引：32

作者：

Chakroun, I. ^{[1
]}

Mezmaz, M. ^{[2
]}

Melab, N. ^{[1
]}

Bendjoudi, A. ^{[3
]}

机构：

[1] Univ Lille 1, CNRS LIFL, INRIA Lille Nord Europe, F-59655 Villeneuve Dascq, France

[2] Univ Mons, Math & Operat Res Dept MathRO, B-7000 Mons, Belgium

[3] Ctr Rech Informat Sci & Tech CERIST, Algiers 16030, Algeria

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2013年 / 25卷 / 08期

关键词：

GPU computing; branch-and-bound algorithms; data parallelism; thread divergence; OPTIMIZATION;

D O I：

10.1002/cpe.2931

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In this paper, we address the design and implementation of graphical processing unit (GPU)-accelerated branch-and-bound algorithms (B&B) for solving flow-shop scheduling optimization problems (FSP). Such applications are CPU-time consuming and highly irregular. On the other hand, GPUs are massively multithreaded accelerators using the single instruction multiple data model at execution. A major issue that arises when executing on GPU, a B&B applied to FSP is thread or branch divergence. Such divergence is caused by the lower bound function of FSP that contains many irregular loops and conditional instructions. Our challenge is therefore to revisit the design and implementation of B&B applied to FSP dealing with thread divergence. Extensive experiments of the proposed approach have been carried out on well-known FSP benchmarks using an Nvidia Tesla (C2050 GPU card (http://www.nvidia.com/docs/IO/43395/NV_DS_Tesla_C2050_C2070_jul10_lores.pdf)). Compared with a CPU-based execution, accelerations up tox77.46 are achieved for large problem instances. Copyright (c) 2012 John Wiley & Sons, Ltd.

引用

页码：1121 / 1136

页数：16

共 20 条

[1] Next-generation acceleration and code optimization for light transport in turbid media using GPUs [J].

Alerstam, Erik ;

Lo, William Chun Yip ;

Han, Tianyi David ;

Rose, Jonathan ;

Andersson-Engels, Stefan ;

Lilge, Lothar .

BIOMEDICAL OPTICS EXPRESS, 2010, 1 (02) :658-675

[2] A parallel algorithm for graph matching and its MasPar implementation [J].

Allen, R ;

Cinque, L ;

Tanimoto, S ;

Shapiro, L ;

Yasuda, D .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1997, 8 (05) :490-501

[3] Branch-and-bound interval global optimization on shared memory multiprocessors [J].

Casado, L. G. ;

Martinez, J. A. ;

Garcia, I. ;

Hendrix, E. M. T. .

OPTIMIZATION METHODS & SOFTWARE, 2008, 23 (05) :689-701

[4]

Chakroun I, 2011, 9 INT C PAR PROC APP

[5]

Chakroun I, 14 IEEE INT C HIGH P

[6]

Dongarra JJ, 2010, SCI COMPUTING MULTIC

[7] Dynamic warp formation and scheduling for efficient GPU control flow [J].

Fung, Wilson W. L. ;

Sham, Ivan ;

Yuan, George ;

Aamodt, Tor M. .

MICRO-40: PROCEEDINGS OF THE 40TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2007, :407-+

[8]

Garey M. R., 1976, Mathematics of Operations Research, V1, P117, DOI 10.1287/moor.1.2.117

[9]

Han Tianyi David, 2011, P 4 WORKSHOP GEN PUR, DOI DOI 10.1145/1964179.1964184

[10]

Johnson S.M., 1954, NAVAL RES LOGISTICS, V1, P61, DOI [DOI 10.1002/NAV.3800010110, 10.1002/nav.3800010110]

← 1 2 →