AN OPTIMIZED SPARSE APPROXIMATE MATRIX MULTIPLY FOR MATRICES WITH DECAY

被引：22

作者：

Bock, Nicolas ^{[1
]}

Challacombe, Matt ^{[1
]}

机构：

[1] Los Alamos Natl Lab, Div Theoret, Los Alamos, NM 87544 USA

来源：

SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2013年 / 35卷 / 01期

关键词：

sparse approximate matrix multiply; sparse linear algebra; SpAMM; reduced complexity algorithm; quantum chemistry; N-body; matrices with decay; LINEAR SCALING COMPUTATION; ADAPTIVE MESH REFINEMENT; ELECTRONIC-STRUCTURE CALCULATIONS; DENSITY-MATRIX; FOCK MATRIX; RECENT PROGRESS; ERROR ESTIMATE; MULTIPLICATION; PERFORMANCE; ALGORITHMS;

D O I：

10.1137/120870761

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

We present an optimized single-precision implementation of the sparse approximate matrix multiply (SpAMM) [M. Challacombe and N. Bock, arXiv 1011.3534, 2010], a fast algorithm for matrix-matrix multiplication for matrices with decay that achieves an O(n log n) computational complexity with respect to matrix dimension n. We find that the max norm of the error achieved with a SpAMM tolerance below 2 x 10(-8) is lower than that of the single-precision general matrix-matrix multiply (SGEMM) for dense quantum chemical matrices, while outperforming SGEMM with a crossover already for small matrices (n similar to 1000). Relative to naive implementations of SpAMM using Intel's Math Kernel Library or AMD's Core Math Library, our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices with differently structured sub-blocks. Finally, we discuss the potential of improved hardware prefetch to yield 2x to 3x speedups.

引用

页码：C72 / C98

页数：27

共 50 条

[21] Versatile Direct and Transpose Matrix Multiplication with Chained Operations: An Optimized Architecture Using Circulant Matrices
Iakymchuk, Taras
Rosado-Munoz, Alfredo
Bataller Mompean, Manuel
Frances Villora, Jose Vicente
Ovie Osimiry, Emmanuel
IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (11) : 3470 - 3479
[22] Sparse matrix multiplication: The distributed block-compressed sparse row library
Borstnik, Urban
VandeVondele, Joost
Weber, Valery
Hutter, Juerg
PARALLEL COMPUTING, 2014, 40 (5-6) : 47 - 58
[23] Sparse Matrix-Vector Multiplication on GPGPUs
Filippone, Salvatore
Cardellini, Valeria
Barbieri, Davide
Fanfarillo, Alessandro
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):
[24] Adaptive Sparse Tiling for Sparse Matrix Multiplication
Hong, Changwan
Sukumaran-Rajam, Aravind
Nisa, Israt
Singh, Kunal
Sadayappan, P.
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 300 - 314
[25] Compressed sensing with sparse, structured matrices
Angelini, Maria Chiara
Ricci-Tersenghi, Federico
Kabashima, Yoshiyuki
2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 808 - 814
[26] Sparse Sums of Positive Semidefinite Matrices
De Carli Silva, Marcel K.
Harvey, Nicholas J. A.
Sato, Cristiane M.
ACM TRANSACTIONS ON ALGORITHMS, 2016, 12 (01)
[27] Powers of low rank sparse matrices
Cohen, Keren
THEORETICAL COMPUTER SCIENCE, 2025, 1032
[28] A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
Liu, Weifeng
Vinter, Brian
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 85 : 47 - 61
[29] IMPLEMENTATION AND COMPUTATIONAL RESULTS FOR THE HIERARCHICAL ALGORITHM FOR MAKING SPARSE MATRICES SPARSER
CHANG, SF
MCCORMICK, ST
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1993, 19 (03): : 419 - 441
[30] Compressed Sampling of Spectrally Sparse Signals Using Sparse Circulant Matrices
Xu, Guangjie
Wang, Huali
Sun, Lei
Zeng, Weijun
Wang, Qingguo
FREQUENZ, 2014, 68 (11-12) : 573 - 580

← 1 2 3 4 5 →