A Performance Analysis of SIMD Algorithms for Monte Carlo Simulations of Nuclear Reactor Cores

被引:6
作者
Ozog, David [1 ]
Malony, Allen D. [1 ]
Siegel, Andrew R. [2 ]
机构
[1] Univ Oregon, Dept Comp & Informat Sci, Eugene, OR 97403 USA
[2] Argonne Natl Lab, Div Math & Comp Sci, Argonne, IL 60439 USA
来源
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2015年
关键词
Monte Carlo; neutron transport; reactor simulation; performance; SIMD; Intel Xeon Phi coprocessor; MIC; TRANSPORT CODE; REPRESENTATION;
D O I
10.1109/IPDPS.2015.105
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A primary characteristic of history-based Monte Carlo neutron transport simulation is the application of MIMD-style parallelism: the path of each neutron particle is largely independent of all other particles, so threads of execution perform independent instructions with respect to other threads. This conflicts with the growing trend of HPC vendors exploiting SIMD hardware, which accomplishes better parallelism and more FLOPS per watt. Event-based neutron transport suits vectorization better than history-based transport, but it is difficult to implement and complicates data management and transfer. However, the Intel Xeon Phi architecture supports the familiar x86 instruction set and memory model, mitigating difficulties in vectorizing neutron transport codes. This paper compares the event-based and history-based approaches for exploiting SIMD in Monte Carlo neutron transport simulations. For both algorithms, we analyze performance using the three different execution models provided by the Xeon Phi (offload, native, and symmetric) within the full-featured OpenMC framework. A representative micro-benchmark of the performance bottleneck computation shows about 10x performance improvement using the event-based method. In an optimized history-based simulation of a full-physics nuclear reactor core in OpenMC, the MIC shows a calculation rate 1.6x higher than a modern 16-core CPU, 2.5x higher when balancing load between the CPU and 1 MIC, and 4x higher when balancing load between the CPU and 2 MICs. As far as we are aware, our calculation rate per node on a high fidelity benchmark (17,098 particles/second) is higher than any other Monte Carlo neutron transport application. Furthermore, we attain 95% distributed efficiency when using MPI and up to 512 concurrent MIC devices.
引用
收藏
页码:733 / 742
页数:10
相关论文
共 19 条
[1]  
Antypas K., 2014, DOE CSGF HPC ADV TOP
[2]   Optimization of Monte Carlo Algorithms and Ray Tracing on GPUs [J].
Bergmann, Ryan M. ;
Vujic, Jasmina L. .
SNA + MC 2013 - JOINT INTERNATIONAL CONFERENCE ON SUPERCOMPUTING IN NUCLEAR APPLICATIONS + MONTE CARLO, 2014,
[3]   MONTE-CARLO METHODS FOR RADIATION TRANSPORT ANALYSIS ON VECTOR COMPUTERS [J].
BROWN, FB ;
MARTIN, WR .
PROGRESS IN NUCLEAR ENERGY, 1984, 14 (03) :269-299
[4]   Direct Doppler broadening in Monte Carlo simulations using the multipole representation [J].
Forget, Benoit ;
Xu, Sheng ;
Smith, Kord .
ANNALS OF NUCLEAR ENERGY, 2014, 64 :78-85
[5]  
Hazra R., 2014, INT SUP C LEIPZ GERM
[6]  
Hoogenboom J., 2009, MONTE CARLO PERFORMA
[8]  
Jeffers J., 2013, Intel Xeon Phi coprocessor high-performance programming
[9]   Two practical methods for unionized energy grid construction in continuous-energy Monte Carlo neutron transport calculation [J].
Leppanen, Jaakko .
ANNALS OF NUCLEAR ENERGY, 2009, 36 (07) :878-885