AN OPTIMIZED SPARSE APPROXIMATE MATRIX MULTIPLY FOR MATRICES WITH DECAY

被引:22
|
作者
Bock, Nicolas [1 ]
Challacombe, Matt [1 ]
机构
[1] Los Alamos Natl Lab, Div Theoret, Los Alamos, NM 87544 USA
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2013年 / 35卷 / 01期
关键词
sparse approximate matrix multiply; sparse linear algebra; SpAMM; reduced complexity algorithm; quantum chemistry; N-body; matrices with decay; LINEAR SCALING COMPUTATION; ADAPTIVE MESH REFINEMENT; ELECTRONIC-STRUCTURE CALCULATIONS; DENSITY-MATRIX; FOCK MATRIX; RECENT PROGRESS; ERROR ESTIMATE; MULTIPLICATION; PERFORMANCE; ALGORITHMS;
D O I
10.1137/120870761
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We present an optimized single-precision implementation of the sparse approximate matrix multiply (SpAMM) [M. Challacombe and N. Bock, arXiv 1011.3534, 2010], a fast algorithm for matrix-matrix multiplication for matrices with decay that achieves an O(n log n) computational complexity with respect to matrix dimension n. We find that the max norm of the error achieved with a SpAMM tolerance below 2 x 10(-8) is lower than that of the single-precision general matrix-matrix multiply (SGEMM) for dense quantum chemical matrices, while outperforming SGEMM with a crossover already for small matrices (n similar to 1000). Relative to naive implementations of SpAMM using Intel's Math Kernel Library or AMD's Core Math Library, our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices with differently structured sub-blocks. Finally, we discuss the potential of improved hardware prefetch to yield 2x to 3x speedups.
引用
收藏
页码:C72 / C98
页数:27
相关论文
共 50 条
  • [41] On Implementing Sparse Matrix Multi-Vector Multiplication on GPUs
    Abu-Sufah, Walid
    Ahmad, Khalid
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 1117 - 1124
  • [42] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application
    Dubois, David
    Dubois, Andrew
    Boorman, Thomas
    Connor, Carolyn
    Poole, Steve
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2010, 3 (01)
  • [43] Sparse Approximate Multifrontal Factorization with Composite Compression Methods
    Claus, Lisa
    Ghysels, Pieter
    Liu, Yang
    Nhan, Thai Anh
    Thirumalaisamy, Ramakrishnan
    Bhalla, Amneet Pal Singh
    Li, Sherry
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2023, 49 (03):
  • [44] THE USE OF SUPERNODES IN FACTORED SPARSE APPROXIMATE INVERSE PRECONDITIONING
    Janna, Carlo
    Ferronato, Massimiliano
    Gambolati, Giuseppe
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01): : C72 - C94
  • [45] Distributed Approximate Message Passing for Sparse Signal Recovery
    Han, Puxiao
    Niu, Ruixin
    Ren, Mengqi
    Eldar, Yonina C.
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 497 - 501
  • [46] A novel graph matrix representation: sequence of neighbourhood matrices with an application
    Karunakaran, Sivakumar
    Selvaganesh, Lavanya
    SN APPLIED SCIENCES, 2020, 2 (05):
  • [47] Fast Approximate Joint Diagonalization Incorporating Weight Matrices
    Tichavsky, Petr
    Yeredor, Arie
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (03) : 878 - 891
  • [48] I/O-Optimal Cache-Oblivious Sparse Matrix-Sparse Matrix Multiplication
    Gleinig, Niels
    Besta, Maciej
    Hoefler, Torsten
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 36 - 46
  • [49] Acceleration of Approximate Matrix Multiplications on GPUs
    Okuyama, Takuya
    Rohm, Andre
    Mihana, Takatomo
    Naruse, Makoto
    ENTROPY, 2023, 25 (08)
  • [50] Direct Sparse Factorization of Blocked Saddle Point Matrices
    Lacoursiere, Claude
    Linde, Mattias
    Sabelstrom, Olof
    APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT II, 2012, 7134 : 324 - 335