A sparse octree gravitational N-body code that runs entirely on the GPU processor

被引:105
作者
Bedorf, Jeroen [1 ]
Gaburov, Evghenii [1 ,2 ]
Zwart, Simon Portegies [1 ]
机构
[1] Leiden Univ, Leiden Observ, NL-2300 RA Leiden, Netherlands
[2] Northwestern Univ, Evanston, IL 60208 USA
关键词
GPU; Parallel; Tree-code; N-body; Gravity; Hierarchical; TREE-CODE;
D O I
10.1016/j.jcp.2011.12.024
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present the implementation and performance of a new gravitational N-body tree-code that is specifically designed for the graphics processing unit (GPU).(1) All parts of the treecode algorithm are executed on the GPU. We present algorithms for parallel construction and traversing of sparse octrees. These algorithms are implemented in CUDA and tested on NVIDIA GPUs, but they are portable to OpenCL and can easily be used on many-core devices from other manufacturers. This portability is achieved by using general parallel-scan and sort methods. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:2825 / 2839
页数:15
相关论文
共 43 条
  • [1] [Anonymous], MSRTR200853
  • [2] [Anonymous], 2010, NVIDIA CUDA Programming Guide
  • [3] [Anonymous], ASTROPHYSICS
  • [4] [Anonymous], OPENCL SPEC VERS 1 0
  • [5] [Anonymous], 1990, CMUCS90190 CARN MELL
  • [6] [Anonymous], CS201003 U VIRG DEP
  • [7] [Anonymous], COMPUTATIONAL ASTROP
  • [8] [Anonymous], INT C COMP SCI 2010
  • [9] [Anonymous], COMPUTATIONAL GEOMET
  • [10] [Anonymous], 2008, NVR2008003 NVIDIA