Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

被引:42
|
作者
Patwary, Md. Mostofa Ali [1 ]
Satish, Nadathur Rajagopalan [1 ]
Sundaram, Narayanan [1 ]
Park, Jongsoo [1 ]
Anderson, Michael J. [1 ]
Vadlamudi, Satya Gautam [1 ]
Das, Dipankar [1 ]
Pudov, Sergey G. [2 ]
Pirogov, Vadim O. [2 ]
Dubey, Pradeep [1 ]
机构
[1] Intel Corp, Parallel Comp Lab, Santa Clara, CA 95054 USA
[2] Intel Corp, Software & Serv Grp, Santa Clara, CA USA
来源
HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015 | 2015年 / 9137卷
关键词
ALGORITHMS;
D O I
10.1007/978-3-319-20119-1_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse matrix-matrix multiplication (SpGEMM) is a key kernel in many applications in High Performance Computing such as algebraic multigrid solvers and graph analytics. Optimizing SpGEMM on modern processors is challenging due to random data accesses, poor data locality and load imbalance during computation. In this work, we investigate different partitioning techniques, cache optimizations (using dense arrays instead of hash tables), and dynamic load balancing on SpGEMM using a diverse set of real-world and synthetic datasets. We demonstrate that our implementation outperforms the state-of-the-art using Intel (R) Xeon (R) processors. We are up to 3.8X faster than Intel (R) Math Kernel Library (MKL) and up to 257X faster than CombBLAS. We also outperform the best published GPU implementation of SpGEMM on nVidia GTX Titan and on AMD Radeon HD 7970 by up to 7.3X and 4.5X, respectively on their published datasets. We demonstrate good multi-core scalability (geomean speedup of 18.2X using 28 threads) as compared to MKL which gets 7.5X scaling on 28 threads.
引用
收藏
页码:48 / 57
页数:10
相关论文
共 50 条
  • [1] Column-Segmented Sparse Matrix-Matrix Multiplication on Multicore CPUs
    An, Xiaojing
    Catalyurek, Umit, V
    2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 202 - 211
  • [2] Computationally efficient parallel matrix-matrix multiplication on the torus
    Zekri, Ahmed S.
    Sedukhin, Stanislav G.
    HIGH-PERFORMANCE COMPUTING, 2008, 4759 : 219 - 226
  • [3] PARALLEL SPARSE MATRIX-MATRIX MULTIPLICATION AND INDEXING: IMPLEMENTATION AND EXPERIMENTS
    Buluc, Aydin
    Gilbert, John R.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04): : C170 - C191
  • [4] Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication
    Akbudak, Kadir
    Selvitopi, Oguz
    Aykanat, Cevdet
    ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018, 4 (03)
  • [5] Matrix-matrix multiplication on heterogeneous platforms
    Beaumont, O
    Boudet, V
    Rastello, F
    Robert, Y
    2000 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 289 - 298
  • [6] Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication
    Liu, Junhong
    He, Xin
    Liu, Weifeng
    Tan, Guangming
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2019, 47 (03) : 403 - 417
  • [7] Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication
    Ballard, Grey
    Druinsky, Alex
    Knight, Nicholas
    Schwartz, Oded
    SPAA'15: PROCEEDINGS OF THE 27TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2015, : 86 - 88
  • [8] PERFORMANCE EVALUATION OF SPARSE MATRIX-MATRIX MULTIPLICATION
    Jain-Mendon, Shweta
    Sass, Ron
    2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS, 2013,
  • [9] Hypergraph partitioning for sparse matrix-matrix multiplication
    Ballard G.
    Druinsky A.
    Knight N.
    Schwartz O.
    ACM Transactions on Parallel Computing, 2016, 3 (03) : 1 - 34
  • [10] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
    Liu, Weifeng
    Vinter, Brian
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,