MATAR: A performance portability and productivity implementation of data-oriented design with Kokkos

被引：6

作者：

Dunning, Daniel J. ^{[1
,3
]}

Morgan, Nathaniel R. ^{[2
,4
]}

Moore, Jacob L. ^{[2
,4
]}

Nelluvelil, Eappen ^{[2
,5
]}

Tafolla, Tanya, V ^{[2
,6
]}

Robey, Robert W. ^{[1
]}

机构：

[1] Los Alamos Natl Lab, Eulerian Applicat Grp, Los Alamos, NM 87545 USA

[2] Los Alamos Natl Lab, Continuum Models & Numer Methods Grp, Los Alamos, NM 87545 USA

[3] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA

[4] Mississippi State Univ, Dept Mech Engn, Mississippi State, MS USA

[5] Rice Univ, Computat & Appl Math, Houston, TX 77251 USA

[6] Univ Calif Merced, Dept Appl Math, Merced, CA USA

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2021年 / 157卷

关键词：

Performance; Portability; Productivity; Memory efficiency; GPUs; Dense and sparse storage;

D O I：

10.1016/j.jpdc.2021.03.016

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

There is a need for simple, fast, and memory-efficient multidimensional data structures for dense and sparse storage that arise with numerical methods and in software applications. The data structures must perform equally well across multiple computer architectures, including CPUs and GPUs. For this purpose, we developed MATAR, a C++ software library that allows for simple creation and use of intricate data structures that is also portable across disparate architectures using Kokkos. The performance aspect is achieved by forcing contiguous memory layout (or as close to contiguous as possible) for multidimensional and multi-size dense or sparse MATrix and ARray (hence, MATAR) types. Our results show that MATAR has the capability to improve memory utilization, performance, and programmer productivity in scientific computing. This is achieved by fitting more work into the available memory, minimizing memory loads required, and by loading memory in the most efficient order. This document describes the purpose of the work, the implementation of each of the data types, and the resulting performance both in some simple baseline test cases and in an application code. (C) 2021 Elsevier Inc. All rights reserved.

引用

页码：86 / 104

页数：19

共 37 条

[1]

[Anonymous], 2009, Introduction to Algorithms

[2]

[Anonymous], 1994, USENIX SUMMER

[3] Constrained optimization framework for interface-aware sub-scale dynamics discrete closure model for multimaterial cells in Lagrangian cell-centered hydrodynamics [J].