Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

被引：42

作者：

Patwary, Md. Mostofa Ali ^{[1
]}

Satish, Nadathur Rajagopalan ^{[1
]}

Sundaram, Narayanan ^{[1
]}

Park, Jongsoo ^{[1
]}

Anderson, Michael J. ^{[1
]}

Vadlamudi, Satya Gautam ^{[1
]}

Das, Dipankar ^{[1
]}

Pudov, Sergey G. ^{[2
]}

Pirogov, Vadim O. ^{[2
]}

Dubey, Pradeep ^{[1
]}

机构：

[1] Intel Corp, Parallel Comp Lab, Santa Clara, CA 95054 USA

[2] Intel Corp, Software & Serv Grp, Santa Clara, CA USA

来源：

HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015 | 2015年 / 9137卷

关键词：

ALGORITHMS;

D O I：

10.1007/978-3-319-20119-1_4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse matrix-matrix multiplication (SpGEMM) is a key kernel in many applications in High Performance Computing such as algebraic multigrid solvers and graph analytics. Optimizing SpGEMM on modern processors is challenging due to random data accesses, poor data locality and load imbalance during computation. In this work, we investigate different partitioning techniques, cache optimizations (using dense arrays instead of hash tables), and dynamic load balancing on SpGEMM using a diverse set of real-world and synthetic datasets. We demonstrate that our implementation outperforms the state-of-the-art using Intel (R) Xeon (R) processors. We are up to 3.8X faster than Intel (R) Math Kernel Library (MKL) and up to 257X faster than CombBLAS. We also outperform the best published GPU implementation of SpGEMM on nVidia GTX Titan and on AMD Radeon HD 7970 by up to 7.3X and 4.5X, respectively on their published datasets. We demonstrate good multi-core scalability (geomean speedup of 18.2X using 28 threads) as compared to MKL which gets 7.5X scaling on 28 threads.

引用

页码：48 / 57

页数：10

共 50 条

[1] Column-Segmented Sparse Matrix-Matrix Multiplication on Multicore CPUs
An, Xiaojing
Catalyurek, Umit, V
2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 202 - 211
[2] Computationally efficient parallel matrix-matrix multiplication on the torus
Zekri, Ahmed S.
Sedukhin, Stanislav G.
HIGH-PERFORMANCE COMPUTING, 2008, 4759 : 219 - 226
[3] PARALLEL SPARSE MATRIX-MATRIX MULTIPLICATION AND INDEXING: IMPLEMENTATION AND EXPERIMENTS
Buluc, Aydin
Gilbert, John R.
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04): : C170 - C191
[4] Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication
Akbudak, Kadir
Selvitopi, Oguz
Aykanat, Cevdet
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018, 4 (03)
[5] Matrix-matrix multiplication on heterogeneous platforms
Beaumont, O
Boudet, V
Rastello, F
Robert, Y
2000 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 289 - 298
[6] Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication
Liu, Junhong
He, Xin
Liu, Weifeng
Tan, Guangming
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2019, 47 (03) : 403 - 417
[7] Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication
Ballard, Grey
Druinsky, Alex
Knight, Nicholas
Schwartz, Oded
SPAA'15: PROCEEDINGS OF THE 27TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2015, : 86 - 88
[8] PERFORMANCE EVALUATION OF SPARSE MATRIX-MATRIX MULTIPLICATION
Jain-Mendon, Shweta
Sass, Ron
2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS, 2013,
[9] Hypergraph partitioning for sparse matrix-matrix multiplication
Ballard G.
Druinsky A.
Knight N.
Schwartz O.
ACM Transactions on Parallel Computing, 2016, 3 (03) : 1 - 34
[10] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
Liu, Weifeng
Vinter, Brian
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,

← 1 2 3 4 5 →