AN OPTIMIZED SPARSE APPROXIMATE MATRIX MULTIPLY FOR MATRICES WITH DECAY

被引：22

作者：

Bock, Nicolas ^{[1
]}

Challacombe, Matt ^{[1
]}

机构：

[1] Los Alamos Natl Lab, Div Theoret, Los Alamos, NM 87544 USA

来源：

SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2013年 / 35卷 / 01期

关键词：

sparse approximate matrix multiply; sparse linear algebra; SpAMM; reduced complexity algorithm; quantum chemistry; N-body; matrices with decay; LINEAR SCALING COMPUTATION; ADAPTIVE MESH REFINEMENT; ELECTRONIC-STRUCTURE CALCULATIONS; DENSITY-MATRIX; FOCK MATRIX; RECENT PROGRESS; ERROR ESTIMATE; MULTIPLICATION; PERFORMANCE; ALGORITHMS;

D O I：

10.1137/120870761

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

We present an optimized single-precision implementation of the sparse approximate matrix multiply (SpAMM) [M. Challacombe and N. Bock, arXiv 1011.3534, 2010], a fast algorithm for matrix-matrix multiplication for matrices with decay that achieves an O(n log n) computational complexity with respect to matrix dimension n. We find that the max norm of the error achieved with a SpAMM tolerance below 2 x 10(-8) is lower than that of the single-precision general matrix-matrix multiply (SGEMM) for dense quantum chemical matrices, while outperforming SGEMM with a crossover already for small matrices (n similar to 1000). Relative to naive implementations of SpAMM using Intel's Math Kernel Library or AMD's Core Math Library, our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices with differently structured sub-blocks. Finally, we discuss the potential of improved hardware prefetch to yield 2x to 3x speedups.

引用

页码：C72 / C98

页数：27

共 50 条

[41] On Implementing Sparse Matrix Multi-Vector Multiplication on GPUs
Abu-Sufah, Walid
Ahmad, Khalid
2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 1117 - 1124
[42] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application
Dubois, David
Dubois, Andrew
Boorman, Thomas
Connor, Carolyn
Poole, Steve
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2010, 3 (01)
[43] Sparse Approximate Multifrontal Factorization with Composite Compression Methods
Claus, Lisa
Ghysels, Pieter
Liu, Yang
Nhan, Thai Anh
Thirumalaisamy, Ramakrishnan
Bhalla, Amneet Pal Singh
Li, Sherry
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2023, 49 (03):
[44] THE USE OF SUPERNODES IN FACTORED SPARSE APPROXIMATE INVERSE PRECONDITIONING
Janna, Carlo
Ferronato, Massimiliano
Gambolati, Giuseppe
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01): : C72 - C94
[45] Distributed Approximate Message Passing for Sparse Signal Recovery
Han, Puxiao
Niu, Ruixin
Ren, Mengqi
Eldar, Yonina C.
2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 497 - 501
[46] A novel graph matrix representation: sequence of neighbourhood matrices with an application
Karunakaran, Sivakumar
Selvaganesh, Lavanya
SN APPLIED SCIENCES, 2020, 2 (05):
[47] Fast Approximate Joint Diagonalization Incorporating Weight Matrices
Tichavsky, Petr
Yeredor, Arie
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (03) : 878 - 891
[48] I/O-Optimal Cache-Oblivious Sparse Matrix-Sparse Matrix Multiplication
Gleinig, Niels
Besta, Maciej
Hoefler, Torsten
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 36 - 46
[49] Acceleration of Approximate Matrix Multiplications on GPUs
Okuyama, Takuya
Rohm, Andre
Mihana, Takatomo
Naruse, Makoto
ENTROPY, 2023, 25 (08)
[50] Direct Sparse Factorization of Blocked Saddle Point Matrices
Lacoursiere, Claude
Linde, Mattias
Sabelstrom, Olof
APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT II, 2012, 7134 : 324 - 335

← 1 2 3 4 5 →