Distributed-Memory Hierarchical Compression of Dense SPD Matrices

被引:0
作者
Yu, Chenhan D. [1 ]
Reiz, Severin [2 ]
Biros, George [3 ]
机构
[1] Tech Univ Munich, Dept Comp Sci, Munich, Germany
[2] Tech Univ Munich, Munich, Germany
[3] Univ Texas Austin, Inst Computat Engn & Sci, Austin, TX 78712 USA
来源
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18) | 2018年
关键词
ALGORITHMS; EFFICIENT;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a distributed memory algorithm for the hierarchical compression of symmetric positive definite (SPD) matrices. Our method is based on GOFMM, an algorithm that appeared in doi:10.1145/3126908.3126921. For many SPD matrices, GOFMM enables compression and approximate matrix-vector multiplication that for many matrices can reach N log N time-as opposed to N-2 required for a dense matrix. But GOFMM supports only shared memory parallelism. In this paper, we use the message passing interface (MPI) and extend the ideas of GOFMM to the distributed memory setting. We also propose and implement an asynchronous algorithm for faster multiplication. We present different usage scenarios on a selection of SPD matrices that are related to graphs, neural-networks, and covariance operators. We present results on the Texas Advanced Computing Center's "Stampede 2" system. We also compare with the STRUMPACK software package, which, to our knowledge, is the only other available software that can compress arbitrary SPD matrices in parallel. In our largest run, we were able to compress a 67M-by-67M matrix in less than three minutes and perform a multiplication with 512 vectors within 5 seconds on 6,144 Intel "Skylake" cores.
引用
收藏
页数:15
相关论文
共 25 条
  • [1] [Anonymous], 2002, P 19 INT C MACH LEAR
  • [2] [Anonymous], 2008, HIERARCHICAL MATRICE
  • [3] Benzi M, 2005, ACTA NUMER, V14, P1, DOI 10.1017/S0962492904000212
  • [4] ON THE USE OF STOCHASTIC HESSIAN INFORMATION IN OPTIMIZATION METHODS FOR MACHINE LEARNING
    Byrd, Richard H.
    Chin, Gillian M.
    Neveitt, Will
    Nocedal, Jorge
    [J]. SIAM JOURNAL ON OPTIMIZATION, 2011, 21 (03) : 977 - 995
  • [5] Word-sequence kernels
    Cancedda, N
    Gaussier, E
    Goutte, C
    Renders, JM
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1059 - 1082
  • [6] SuperMatrix: A Multithreaded Runtime Scheduling System for Algorithms-by-Blocks
    Chan, Ernie
    Van Zee, Field G.
    van de Geijn, Robert
    Bientinesi, Paolo
    Quintana-Orti, Enrique S.
    Quintana-Orti, Gregorio
    [J]. PPOPP'08: PROCEEDINGS OF THE 2008 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2008, : 123 - 132
  • [7] AN EFFICIENT MULTICORE IMPLEMENTATION OF A NOVEL HSS-STRUCTURED MULTIFRONTAL SOLVER USING RANDOMIZED SAMPLING
    Ghysels, Pieter
    Li, Xiaoye S.
    Rouet, Francois-Henry
    Williams, Samuel
    Napov, Artem
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2016, 38 (05) : S358 - S384
  • [8] Parallel black box H-LU preconditioning for elliptic boundary value problems
    Grasedyck, Lars
    Kriemann, Ronald
    Le Borne, Sabine
    [J]. COMPUTING AND VISUALIZATION IN SCIENCE, 2008, 11 (4-6) : 273 - 291
  • [9] Gray AG, 2001, ADV NEUR IN, V13, P521
  • [10] FAST ALGORITHMS FOR CLASSICAL PHYSICS
    GREENGARD, L
    [J]. SCIENCE, 1994, 265 (5174) : 909 - 914