AN OPTIMIZED SPARSE APPROXIMATE MATRIX MULTIPLY FOR MATRICES WITH DECAY

被引:22
|
作者
Bock, Nicolas [1 ]
Challacombe, Matt [1 ]
机构
[1] Los Alamos Natl Lab, Div Theoret, Los Alamos, NM 87544 USA
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2013年 / 35卷 / 01期
关键词
sparse approximate matrix multiply; sparse linear algebra; SpAMM; reduced complexity algorithm; quantum chemistry; N-body; matrices with decay; LINEAR SCALING COMPUTATION; ADAPTIVE MESH REFINEMENT; ELECTRONIC-STRUCTURE CALCULATIONS; DENSITY-MATRIX; FOCK MATRIX; RECENT PROGRESS; ERROR ESTIMATE; MULTIPLICATION; PERFORMANCE; ALGORITHMS;
D O I
10.1137/120870761
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We present an optimized single-precision implementation of the sparse approximate matrix multiply (SpAMM) [M. Challacombe and N. Bock, arXiv 1011.3534, 2010], a fast algorithm for matrix-matrix multiplication for matrices with decay that achieves an O(n log n) computational complexity with respect to matrix dimension n. We find that the max norm of the error achieved with a SpAMM tolerance below 2 x 10(-8) is lower than that of the single-precision general matrix-matrix multiply (SGEMM) for dense quantum chemical matrices, while outperforming SGEMM with a crossover already for small matrices (n similar to 1000). Relative to naive implementations of SpAMM using Intel's Math Kernel Library or AMD's Core Math Library, our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices with differently structured sub-blocks. Finally, we discuss the potential of improved hardware prefetch to yield 2x to 3x speedups.
引用
收藏
页码:C72 / C98
页数:27
相关论文
共 50 条
  • [21] Versatile Direct and Transpose Matrix Multiplication with Chained Operations: An Optimized Architecture Using Circulant Matrices
    Iakymchuk, Taras
    Rosado-Munoz, Alfredo
    Bataller Mompean, Manuel
    Frances Villora, Jose Vicente
    Ovie Osimiry, Emmanuel
    IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (11) : 3470 - 3479
  • [22] Sparse matrix multiplication: The distributed block-compressed sparse row library
    Borstnik, Urban
    VandeVondele, Joost
    Weber, Valery
    Hutter, Juerg
    PARALLEL COMPUTING, 2014, 40 (5-6) : 47 - 58
  • [23] Sparse Matrix-Vector Multiplication on GPGPUs
    Filippone, Salvatore
    Cardellini, Valeria
    Barbieri, Davide
    Fanfarillo, Alessandro
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):
  • [24] Adaptive Sparse Tiling for Sparse Matrix Multiplication
    Hong, Changwan
    Sukumaran-Rajam, Aravind
    Nisa, Israt
    Singh, Kunal
    Sadayappan, P.
    PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 300 - 314
  • [25] Compressed sensing with sparse, structured matrices
    Angelini, Maria Chiara
    Ricci-Tersenghi, Federico
    Kabashima, Yoshiyuki
    2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 808 - 814
  • [26] Sparse Sums of Positive Semidefinite Matrices
    De Carli Silva, Marcel K.
    Harvey, Nicholas J. A.
    Sato, Cristiane M.
    ACM TRANSACTIONS ON ALGORITHMS, 2016, 12 (01)
  • [27] Powers of low rank sparse matrices
    Cohen, Keren
    THEORETICAL COMPUTER SCIENCE, 2025, 1032
  • [28] A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
    Liu, Weifeng
    Vinter, Brian
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 85 : 47 - 61
  • [29] IMPLEMENTATION AND COMPUTATIONAL RESULTS FOR THE HIERARCHICAL ALGORITHM FOR MAKING SPARSE MATRICES SPARSER
    CHANG, SF
    MCCORMICK, ST
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1993, 19 (03): : 419 - 441
  • [30] Compressed Sampling of Spectrally Sparse Signals Using Sparse Circulant Matrices
    Xu, Guangjie
    Wang, Huali
    Sun, Lei
    Zeng, Weijun
    Wang, Qingguo
    FREQUENZ, 2014, 68 (11-12) : 573 - 580