Distributed-Memory Hierarchical Compression of Dense SPD Matrices

被引：0

作者：

Yu, Chenhan D. ^{[1
]}

Reiz, Severin ^{[2
]}

Biros, George ^{[3
]}

机构：

[1] Tech Univ Munich, Dept Comp Sci, Munich, Germany

[2] Tech Univ Munich, Munich, Germany

[3] Univ Texas Austin, Inst Computat Engn & Sci, Austin, TX 78712 USA

来源：

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18) | 2018年

关键词：

ALGORITHMS; EFFICIENT;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a distributed memory algorithm for the hierarchical compression of symmetric positive definite (SPD) matrices. Our method is based on GOFMM, an algorithm that appeared in doi:10.1145/3126908.3126921. For many SPD matrices, GOFMM enables compression and approximate matrix-vector multiplication that for many matrices can reach N log N time-as opposed to N-2 required for a dense matrix. But GOFMM supports only shared memory parallelism. In this paper, we use the message passing interface (MPI) and extend the ideas of GOFMM to the distributed memory setting. We also propose and implement an asynchronous algorithm for faster multiplication. We present different usage scenarios on a selection of SPD matrices that are related to graphs, neural-networks, and covariance operators. We present results on the Texas Advanced Computing Center's "Stampede 2" system. We also compare with the STRUMPACK software package, which, to our knowledge, is the only other available software that can compress arbitrary SPD matrices in parallel. In our largest run, we were able to compress a 67M-by-67M matrix in less than three minutes and perform a multiplication with 512 vectors within 5 seconds on 6,144 Intel "Skylake" cores.

引用

页数：15

共 25 条

[1] [Anonymous], 2002, P 19 INT C MACH LEAR
[2] [Anonymous], 2008, HIERARCHICAL MATRICE
[3] Benzi M, 2005, ACTA NUMER, V14, P1, DOI 10.1017/S0962492904000212
[4] ON THE USE OF STOCHASTIC HESSIAN INFORMATION IN OPTIMIZATION METHODS FOR MACHINE LEARNING
Byrd, Richard H.
Chin, Gillian M.
Neveitt, Will
Nocedal, Jorge
[J]. SIAM JOURNAL ON OPTIMIZATION, 2011, 21 (03) : 977 - 995
[5] Word-sequence kernels
Cancedda, N
Gaussier, E
Goutte, C
Renders, JM
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1059 - 1082
[6] SuperMatrix: A Multithreaded Runtime Scheduling System for Algorithms-by-Blocks
Chan, Ernie
Van Zee, Field G.
van de Geijn, Robert
Bientinesi, Paolo
Quintana-Orti, Enrique S.
Quintana-Orti, Gregorio
[J]. PPOPP'08: PROCEEDINGS OF THE 2008 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2008, : 123 - 132
[7] AN EFFICIENT MULTICORE IMPLEMENTATION OF A NOVEL HSS-STRUCTURED MULTIFRONTAL SOLVER USING RANDOMIZED SAMPLING
Ghysels, Pieter
Li, Xiaoye S.
Rouet, Francois-Henry
Williams, Samuel
Napov, Artem
[J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2016, 38 (05) : S358 - S384
[8] Parallel black box H-LU preconditioning for elliptic boundary value problems
Grasedyck, Lars
Kriemann, Ronald
Le Borne, Sabine
[J]. COMPUTING AND VISUALIZATION IN SCIENCE, 2008, 11 (4-6) : 273 - 291
[9] Gray AG, 2001, ADV NEUR IN, V13, P521
[10] FAST ALGORITHMS FOR CLASSICAL PHYSICS
GREENGARD, L
[J]. SCIENCE, 1994, 265 (5174) : 909 - 914

← 1 2 3 →