MATAR: A performance portability and productivity implementation of data-oriented design with Kokkos

被引:6
作者
Dunning, Daniel J. [1 ,3 ]
Morgan, Nathaniel R. [2 ,4 ]
Moore, Jacob L. [2 ,4 ]
Nelluvelil, Eappen [2 ,5 ]
Tafolla, Tanya, V [2 ,6 ]
Robey, Robert W. [1 ]
机构
[1] Los Alamos Natl Lab, Eulerian Applicat Grp, Los Alamos, NM 87545 USA
[2] Los Alamos Natl Lab, Continuum Models & Numer Methods Grp, Los Alamos, NM 87545 USA
[3] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
[4] Mississippi State Univ, Dept Mech Engn, Mississippi State, MS USA
[5] Rice Univ, Computat & Appl Math, Houston, TX 77251 USA
[6] Univ Calif Merced, Dept Appl Math, Merced, CA USA
关键词
Performance; Portability; Productivity; Memory efficiency; GPUs; Dense and sparse storage;
D O I
10.1016/j.jpdc.2021.03.016
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
There is a need for simple, fast, and memory-efficient multidimensional data structures for dense and sparse storage that arise with numerical methods and in software applications. The data structures must perform equally well across multiple computer architectures, including CPUs and GPUs. For this purpose, we developed MATAR, a C++ software library that allows for simple creation and use of intricate data structures that is also portable across disparate architectures using Kokkos. The performance aspect is achieved by forcing contiguous memory layout (or as close to contiguous as possible) for multidimensional and multi-size dense or sparse MATrix and ARray (hence, MATAR) types. Our results show that MATAR has the capability to improve memory utilization, performance, and programmer productivity in scientific computing. This is achieved by fitting more work into the available memory, minimizing memory loads required, and by loading memory in the most efficient order. This document describes the purpose of the work, the implementation of each of the data types, and the resulting performance both in some simple baseline test cases and in an application code. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:86 / 104
页数:19
相关论文
共 37 条
[1]  
[Anonymous], 2009, Introduction to Algorithms
[2]  
[Anonymous], 1994, USENIX SUMMER
[3]   Constrained optimization framework for interface-aware sub-scale dynamics discrete closure model for multimaterial cells in Lagrangian cell-centered hydrodynamics [J].
Barlow, Andrew ;
Morgan, Nathaniel ;
Shashkov, Mikhail .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2019, 78 (02) :541-564
[4]   Umpire: Application-focused management and coordination of complex hierarchical memory [J].
Beckingsale, D. A. ;
McFadden, M. J. ;
Dahm, J. P. S. ;
Pankajakshan, R. ;
Hornung, R. D. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2020, 64 (3-4)
[5]  
Berger ED, 2000, ACM SIGPLAN NOTICES, V35, P117, DOI 10.1145/384264.379232
[6]   3D Cell-centered hydrodynamics with subscale closure model and multi-material remap [J].
Chiravalle, Vincent P. ;
Barlow, Andrew ;
Morgan, Nathaniel R. .
COMPUTERS & FLUIDS, 2020, 207
[7]  
Costa D., 2017, P 8 ACM SPEC INT C P, P389, DOI DOI 10.1145/3030207.3030221
[8]   GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models [J].
Deakin, Tom ;
Price, James ;
Martineau, Matt ;
McIntosh-Smith, Simon .
HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 :489-507
[9]   Kokkos: Enabling manycore performance portability through polymorphic memory access patterns [J].
Edwards, H. Carter ;
Trott, Christian R. ;
Sunderland, Daniel .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (12) :3202-3216
[10]  
Elias D., 2014, Proceedings of the 29th Annual ACM Symposium on Applied Computing, SAC14, P1545