Massively Parallel Algorithm and Implementation of RI-MP2 Energy Calculation for Peta-Scale Many-Core Supercomputers
被引:29
作者:
Katouda, Michio
论文数: 0引用数: 0
h-index: 0
机构:
RIKEN, Adv Inst Computat Sci, Computat Mol Sci Res Team, Chuo Ku, 7-1-26 Minatojima Minami Machi, Kobe, Hyogo 6500047, JapanRIKEN, Adv Inst Computat Sci, Computat Mol Sci Res Team, Chuo Ku, 7-1-26 Minatojima Minami Machi, Kobe, Hyogo 6500047, Japan
Katouda, Michio
[1
]
Naruse, Akira
论文数: 0引用数: 0
h-index: 0
机构:
NVIDIA Corp, Minato Ku, 2-11-7 Akasaka, Tokyo 1070052, JapanRIKEN, Adv Inst Computat Sci, Computat Mol Sci Res Team, Chuo Ku, 7-1-26 Minatojima Minami Machi, Kobe, Hyogo 6500047, Japan
Naruse, Akira
[2
]
Hirano, Yukihiko
论文数: 0引用数: 0
h-index: 0
机构:
NVIDIA Corp, Minato Ku, 2-11-7 Akasaka, Tokyo 1070052, JapanRIKEN, Adv Inst Computat Sci, Computat Mol Sci Res Team, Chuo Ku, 7-1-26 Minatojima Minami Machi, Kobe, Hyogo 6500047, Japan
Hirano, Yukihiko
[2
]
Nakajima, Takahito
论文数: 0引用数: 0
h-index: 0
机构:
RIKEN, Adv Inst Computat Sci, Computat Mol Sci Res Team, Chuo Ku, 7-1-26 Minatojima Minami Machi, Kobe, Hyogo 6500047, JapanRIKEN, Adv Inst Computat Sci, Computat Mol Sci Res Team, Chuo Ku, 7-1-26 Minatojima Minami Machi, Kobe, Hyogo 6500047, Japan
Nakajima, Takahito
[1
]
机构:
[1] RIKEN, Adv Inst Computat Sci, Computat Mol Sci Res Team, Chuo Ku, 7-1-26 Minatojima Minami Machi, Kobe, Hyogo 6500047, Japan
[2] NVIDIA Corp, Minato Ku, 2-11-7 Akasaka, Tokyo 1070052, Japan
A new parallel algorithm and its implementation for the RIMP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. (C) 2016 Wiley Periodicals, Inc.