NBODY6++GPU: ready for the gravitational million-body problem

被引:178
作者
Wang, Long [1 ,2 ]
Spurzem, Rainer [1 ,3 ,4 ,5 ]
Aarseth, Sverre [6 ]
Nitadori, Keigo [7 ]
Berczik, Peter [3 ,4 ,5 ,8 ]
Kouwenhoven, M. B. N. [1 ,2 ]
Naab, Thorsten [9 ]
机构
[1] Peking Univ, Kavli Inst Astron & Astrophys, Beijing 100871, Peoples R China
[2] Peking Univ, Sch Phys, Dept Astron, Beijing 100871, Peoples R China
[3] Chinese Acad Sci, Natl Astron Observ & Key Lab Computat Astrophys, Beijing 100012, Peoples R China
[4] Chinese Acad Sci, Inst Theoret Phys, Key Lab Frontiers Theoret Phys, Beijing 100190, Peoples R China
[5] Heidelberg Univ, Astronom Rechen Inst Zentrum Astron, D-69120 Heidelberg, Germany
[6] Univ Cambridge, Inst Astron, Cambridge CB3 0HA, England
[7] RIKEN, Adv Inst Computat Sci, Kobe, Hyogo, Japan
[8] Natl Acad Sci Ukraine, Main Astron Observ, UA-03680 Kiev, Ukraine
[9] Max Planck Inst Astrophys, D-85741 Garching, Germany
基金
中国国家自然科学基金;
关键词
methods: numerical; globular clusters: general; SPECIAL-PURPOSE COMPUTER; PERFORMANCE ANALYSIS; GLOBULAR-CLUSTERS; CORE-COLLAPSE; IMPLEMENTATION; ALGORITHMS; SCHEME; MODEL;
D O I
10.1093/mnras/stv817
中图分类号
P1 [天文学];
学科分类号
0704 ;
摘要
Accurate direct N-body simulations help to obtain detailed information about the dynamical evolution of star clusters. They also enable comparisons with analytical models and Fokker-Planck or Monte Carlo methods. NBODY6 is a well-known direct N-body code for star clusters, and NBODY6++ is the extended version designed for large particle number simulations by supercomputers. We present NBODY6++ GPU, an optimized version of NBODY6++ with hybrid parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large direct N-body simulations, and in particular to solve the million-body problem. We discuss the new features of the NBODY6++ GPU code, benchmarks, as well as the first results from a simulation of a realistic globular cluster initially containing a million particles. For million-body simulations, NBODY6++ GPU is 400-2000 times faster than NBODY6 with 320 CPU cores and 32 NVIDIA K20X GPUs. With this computing cluster specification, the simulations of million-body globular clusters including 5 per cent primordial binaries require about an hour per half-mass crossing time.
引用
收藏
页码:4070 / 4080
页数:11
相关论文
共 47 条