Parallel Performance Optimizations on Unstructured Mesh-Based Simulations

被引:7
作者
Sarje, Abhinav [1 ]
Song, Sukhyun [2 ]
Jacobsen, Douglas [3 ]
Huck, Kevin [4 ]
Hollingsworth, Jeffrey [2 ]
Malony, Allen [4 ]
Williams, Samuel [1 ]
Oliker, Leonid [1 ]
机构
[1] Lawrence Berkeley Natl Lab, Berkeley, CA USA
[2] Univ Maryland, College Pk, MD USA
[3] Los Alamos Natl Lab, Los Alamos, NM 87545 USA
[4] Univ Oregon, Eugene, OR 97403 USA
来源
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE | 2015年 / 51卷
关键词
Unstructured Mesh; Ocean Modeling; Graph Partitioning; Performance Optimization;
D O I
10.1016/j.procs.2015.05.466
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra-and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter-and intra-node data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2x. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.
引用
收藏
页码:2016 / 2025
页数:10
相关论文
共 20 条
[1]  
Bader M., 2012, INTRO APPL SCI COMPU
[2]   Optimal Cache-Oblivious Mesh Layouts [J].
Bender, Michael A. ;
Kuszmaul, Bradley C. ;
Teng, Shang-Hua ;
Wang, Kebin .
THEORY OF COMPUTING SYSTEMS, 2011, 48 (02) :269-296
[3]  
Berzins M., 2000, APPL MATH MODELLING
[4]  
Buluc A., 2011, International Journal of High Performance Computing Applications IJHPCA
[5]  
Catalyurek U. V., 2011, PaToH: Partitioning tool for hypergraphs
[6]   Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication [J].
Çatalyürek, ÜV ;
Aykanat, C .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1999, 10 (07) :673-693
[7]  
Cuthill E., 1969, P 1969 24 NAT C, P157, DOI [DOI 10.1145/800195.805928, 10.1145/800195.805928]
[8]  
DENNIS JM, 2007, IEEE INT PAR DISTR P, P1
[9]  
Devine Karen.D, 2006, PAR DISTR PROC S 200
[10]   A new metric enabling an exact hypergraph model for the communication volume in distributed-memory parallel applications [J].
Fortmeier, O. ;
Buecker, H. M. ;
Auer, B. O. Fagginger ;
Bisseling, R. H. .
PARALLEL COMPUTING, 2013, 39 (08) :319-335