Performance Analysis of Speculative Parallel Adaptive Local Timestepping for Conservation Laws

被引:0
作者
Bremer, Maximilian [1 ]
Bachan, John [1 ]
Chan, C. Y. [1 ]
Dawson, Clint [2 ]
机构
[1] Lawrence Berkeley Natl Lab, 1 Cyclotron Rd, Berkeley, CA 94720 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
来源
ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION | 2022年 / 32卷 / 04期
基金
美国国家科学基金会;
关键词
Local timestepping; parallel discrete event simulation; Timewarp; shallow water equations; conservation laws; HIGH-RESOLUTION SCHEMES; DISCONTINUOUS GALERKIN METHOD; VARYING TIME; SIMULATION; SYSTEMS; WAVES;
D O I
10.1145/3545996
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Stable simulation of conservation laws, such as those used to model fluid dynamics and plasma physics applications, requires the satisfaction of the so-called Courant-Friedrichs-Lewy condition. By allowing regions of the mesh to advance with different timesteps that locally satisfy this stability constraint, significant work reduction can be attained when compared to a time integration scheme using a single timestep size. However, parallelizing this algorithm presents considerable difficulty. Since the stability condition depends on the state of the system, dependencies become dynamic and potentially non-local. In this article, we present an adaptive local timestepping algorithm using an optimistic (Timewarp-based) parallel discrete event simulation. We introduce waiting heuristics to limit misspeculation and a semi-static load balancing scheme to eliminate load imbalance as parts of the mesh require finer or coarser timesteps. Last, we outline an interface for separating the physics of the specific conservation law from the temporal integration allowing for productive adoption of our proposed algorithm. We present a misspeculation study for three conservation laws, demonstrating both the productivity of the local timestepping API, for which 74% of the lines of code are reused across different conservation laws, and the robustness of the waiting heuristics-at most 1.5% of element updates are rolled back. Our performance studies demonstrate up to a 2.8x speedup versus a baseline unoptimized local timestepping approach, a 4x improvement in per-node throughput compared to an MPI parallelization of synchronous timestepping, and scalability up to 3,072 cores on NERSC's Cori Haswell partition.
引用
收藏
页数:30
相关论文
共 68 条
[31]  
Floros Xenofon, 2011, P 8 INT MODELICA C, P657
[32]  
Fujimoto R. M., 1993, ORSA Journal on Computing, V5, P213, DOI 10.1287/ijoc.5.3.213
[33]   PARALLEL DISCRETE EVENT SIMULATION [J].
FUJIMOTO, RM .
COMMUNICATIONS OF THE ACM, 1990, 33 (10) :30-53
[34]  
Gafni A., 1988, Distributed Simulation, 1988. Proceedings of the SCS Multiconference on Distributed Simulation, P61
[35]   ON PROCESS MIGRATION AND LOAD BALANCING IN TIME WARP [J].
GLAZER, DW ;
TROPPER, C .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1993, 4 (03) :318-327
[36]   Strong stability-preserving high-order time discretization methods [J].
Gottlieb, S ;
Shu, CW ;
Tadmor, E .
SIAM REVIEW, 2001, 43 (01) :89-112
[37]   The Performance Implication of Task Size for Applications on the HPX Runtime System [J].
Grubel, Patricia ;
Kaiser, Hartmut ;
Cook, Jeanine ;
Serio, Adrian .
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, :682-689
[38]  
Hairer E., 1993, SOLVING ORDIN DIFFER, VI, DOI [10.1007/978-3-540-78862-1, DOI 10.1007/978-3-540-78862-1]
[39]   HIGH-RESOLUTION SCHEMES FOR HYPERBOLIC CONSERVATION-LAWS [J].
HARTEN, A .
JOURNAL OF COMPUTATIONAL PHYSICS, 1983, 49 (03) :357-393
[40]  
Heroux Michael A., 2016, COMP SCI ENG SOFTW S