Learning from Optimizing Matrix-Matrix Multiplication

被引：2

作者：

Parikh, Devangi N. ^{[1
]}

Huang, Jianyu ^{[1
]}

Myers, Margaret E. ^{[1
]}

van de Geijn, Robert A. ^{[1
]}

机构：

[1] Univ Texas Austin, Austin, TX 78712 USA

来源：

2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018) | 2018年

关键词：

parallel computing; high performance computing; matrix-matrix multiplication; computing education; open education;

D O I：

10.1109/IPDPSW.2018.00064

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We describe a learning process that uses one of the simplest examples, matrix-matrix multiplication, to illustrate issues that underlie parallel high-performance computing. It is accessible at multiple levels: simple enough to use early in a curriculum yet rich enough to benefit a more advanced software developer. A carefully designed and scaffolded set of exercises leads the learner from a naive implementation towards one that extracts parallelism at multiple levels, ranging from instruction level parallelism to multithreaded parallelism via OpenMP to distributed memory parallelism using MPI. The importance of effectively leveraging the memory hierarchy within and across nodes is exposed, as do the GotoBLAS and SUMMA algorithms. These materials will become part of a Massive Open Online Course (MOOC) to be offered in the future.

引用

页码：332 / 339

页数：8

共 40 条

[1]

[Anonymous], 1998, SC 98, DOI [10.5555/509058.509096, DOI 10.1109/SC.1998.10004]

[2]

[Anonymous], IEEE COMPUTATIONAL S

[3]

[Anonymous], GEMM PURE C SSE OPTI

[4]

Baden S., 2015, CSE260 ASSIGNMENT

[5]

Bilmes J., 1997, ICS

[6]

Blackford L., 1997, ScaLAPACK Users Guide

[7]

Chan E., CONCURRENCY COMPUTAT

[8]

Chellappa S., 2008, WRITE FAST NUMERICAL, P5, DOI [10.1007/978-3-540-88643-3, DOI 10.1007/978-3-540-88643-3_5]

[9]

Demmel J., 2006, CS267 LECT A

[10]

Dongarra J., 1990, ACM T MATH SOFT

← 1 2 3 4 →