Solving path problems on the GPU

被引:49
作者
Buluc, Aydin [1 ]
Gilbert, John R. [1 ]
Budak, Ceren [1 ]
机构
[1] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
关键词
All-pairs shortest-paths; Gaussian elimination; Graphical processing unit; Semiring; Matrix multiplication; Graph algorithm; Shortest path; Linear algebra; SHORTEST-PATH; ALGORITHMS;
D O I
10.1016/j.parco.2009.12.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the computation of shortest paths on Graphic Processing Units (GPUs). The blocked recursive elimination strategy we use is applicable to a class of algorithms (such as all-pairs shortest-paths, transitive closure, and LU decomposition without pivoting) having similar data access patterns. Using the all-pairs shortest-paths problem as an example, we uncover potential gains over this class of algorithms. The impressive computational power and memory bandwidth of the CPU make it an attractive platform to run such computationally intensive algorithms. Although improvements over CPU implementations have previously been achieved for those algorithms in terms of raw speed, the utilization of the underlying computational resources was quite low. We implemented a recursively partitioned all-pairs shortest-paths algorithm that harnesses the power of GPUs better than existing implementations. The alternate schedule of path computations allowed us to cast almost all operations into matrix-matrix multiplications on a semiring. Since matrix-matrix multiplication is highly optimized and has a high ratio of computation to communication, our implementation does not suffer from the premature saturation of bandwidth resources as iterative algorithms do. By increasing temporal locality, our implementation runs more than two orders of magnitude faster on an NVIDIA 8800 CPU than on an Opteron. Our work provides evidence that programmers should rethink algorithms instead of directly porting them to CPU. (C) 2009 Elsevier By. All rights reserved.
引用
收藏
页码:241 / 253
页数:13
相关论文
共 38 条
[1]  
Aho A. V., 1974, The design and analysis of computer algorithms
[2]  
*AMD, AMD STREAM COMP
[3]  
[Anonymous], [No title captured]
[4]  
[Anonymous], 2001, The Boost Graph Library: User Guide and Reference Manual, Portable Documents
[5]   Sparse matrix solvers on the GPU:: Conjugate gradients and multigrid [J].
Bolz, J ;
Farmer, I ;
Grinspun, E ;
Schröder, P .
ACM TRANSACTIONS ON GRAPHICS, 2003, 22 (03) :917-924
[6]  
Buluc Aydin, 2008, 2008 37th International Conference on Parallel Processing (ICPP), P503, DOI 10.1109/ICPP.2008.45
[7]  
CHOWDHURY RA, 2006, SODA 06, P591
[8]  
*CILK ARTS INC, 2009, CILK PROGR GUID
[9]  
Coppersmith D., 1987, STOC 87, P1, DOI DOI 10.1145/28395.28396
[10]   R-Kleene: A high-performance divide-and-conquer algorithm for the all-pair shortest path for densely connected networks [J].
D'Alberto, Paolo ;
Nicolau, Alexandru .
ALGORITHMICA, 2007, 47 (02) :203-213