Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS

被引:404
作者
Pall, Szilard [1 ]
Zhmurov, Artem [1 ]
Bauer, Paul [2 ]
Abraham, Mark [2 ]
Lundborg, Magnus [3 ]
Gray, Alan [4 ]
Hess, Berk [2 ]
Lindahl, Erik [2 ,5 ]
机构
[1] KTH Royal Inst Technol, PDC Ctr High Performance Comp, Swedish E Sci Res Ctr, S-10044 Stockholm, Sweden
[2] KTH Royal Inst Technol, Swedish E Sci Res Ctr, Dept Appl Phys, Sci Life Lab, Box 1031, S-17121 Solna, Sweden
[3] ERCO Pharma AB, Stockholm, Sweden
[4] NVIDIA Corp, Reading, Berks, England
[5] Stockholm Univ, Dept Biochem & Biophys, Sci Life Lab, Box 1031, S-17121 Solna, Sweden
基金
欧盟地平线“2020”; 瑞典研究理事会; 欧洲研究理事会;
关键词
NONBONDED INTERACTIONS; GPU NODES; ALGORITHMS; EFFICIENT; AMBER; BANG;
D O I
10.1063/5.0018516
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching, and cutoffs. Here, we present the heterogeneous parallelization and acceleration design of molecular dynamics implemented in the GROMACS codebase over the last decade. The setup involves a general cluster-based approach to pair lists and non-bonded pair interactions that utilizes both GPU and central processing unit (CPU) single instruction, multiple data acceleration efficiently, including the ability to load-balance tasks between CPUs and GPUs. The algorithm work efficiency is tuned for each type of hardware, and to use accelerators more efficiently, we introduce dual pair lists with rolling pruning updates. Combined with new direct GPU-GPU communication and GPU integration, this enables excellent performance from single GPU simulations through strong scaling across multiple GPUs and efficient multi-node parallelization.
引用
收藏
页数:15
相关论文
共 50 条
[1]   Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers [J].
Abraham, Mark James ;
Murtola, Teemu ;
Schulz, Roland ;
Páll, Szilárd ;
Smith, Jeremy C. ;
Hess, Berk ;
Lindah, Erik .
SoftwareX, 2015, 1-2 :19-25
[2]   Optimization of Parameters for Molecular Dynamics Simulation Using Smooth Particle-Mesh Ewald in GROMACS 4.5 [J].
Abraham, Mark J. ;
Gready, Jill E. .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2011, 32 (09) :2031-2040
[3]  
Acun B, 2018, IBM J RES DEV, V62, DOI [10.1147/jrd.2018.2888986, 10.1147/JRD.2018.2888986]
[4]   General purpose molecular dynamics simulations fully implemented on graphics processing units [J].
Anderson, Joshua A. ;
Lorenz, Chris D. ;
Travesset, A. .
JOURNAL OF COMPUTATIONAL PHYSICS, 2008, 227 (10) :5342-5359
[5]   RUMD: A general purpose molecular dynamics package optimized to utilize GPU hardware down to a few thousand particles [J].
Bailey, Nicholas P. ;
Ingebrigtsen, Trond S. ;
Hansen, Jesper Schmidt ;
Veldhorst, Arno A. ;
Bohling, Lasse ;
Lemarchand, Claire A. ;
Olsen, Andreas E. ;
Bacher, Andreas K. ;
Costigliola, Lorenzo ;
Pedersen, Ulf R. ;
Larsen, Heine ;
Dyre, Jeppe C. ;
Schroder, Thomas B. .
SCIPOST PHYSICS, 2017, 3 (06)
[6]  
Barba L., 2013, SIAM NEWS
[7]  
Bergdorf M., 2016, DESRESTR201601 DE SH
[8]   hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications [J].
Broquedis, Francois ;
Clet-Ortega, Jerome ;
Moreaud, Stephanie ;
Furmento, Nathalie ;
Goglin, Brice ;
Mercier, Guillaume ;
Thibault, Samuel ;
Namyst, Raymond .
PROCEEDINGS OF THE 18TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2010, :180-186
[9]  
Brown WM, 2016, SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, P82, DOI 10.1109/SC.2016.7
[10]   Implementing molecular dynamics on hybrid high performance computers - Particle-particle particle-mesh [J].
Brown, W. Michael ;
Kohlmeyer, Axel ;
Plimpton, Steven J. ;
Tharrington, Arnold N. .
COMPUTER PHYSICS COMMUNICATIONS, 2012, 183 (03) :449-459