Regularizing Irregularity: Bitmap-based and Portable Sparse Matrix Multiplication for Graph Data on GPUs

被引：10

作者：

Zhang, Jianting ^{[1
]}

Gruenwald, Le ^{[2
]}

机构：

[1] CUNY City Coll, Dept Comp Sci, New York, NY 10031 USA

[2] Univ Oklahoma, Dept Comp Sci, Norman, OK 73072 USA

来源：

GRADES-NDA '18: PROCEEDINGS OF THE 1ST ACM SIGMOD JOINT INTERNATIONAL WORKSHOP ON GRAPH DATA MANAGEMENT EXPERIENCES & SYSTEMS (GRADES) AND NETWORK DATA ANALYTICS (NDA) 2018 (GRADES-NDA 2018) | 2018年

基金：

美国国家科学基金会;

关键词：

bitmap-indexing; Sparse Matrix Multiplication; graph operations; data parallel design; GPU; VECTOR MULTIPLICATION; FORMAT;

D O I：

10.1145/3210259.3210263

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Graphs can be naturally represented as sparse matrices. The relationship between graph algorithms and linear algebra algorithms is well understood and many graph problems can be abstracted as Sparse General Matrix-Matrix Multiplication (SpGEMM) operations. While quite some matrix storage formats, including bitmap-based ones, have been proposed for sparse matrices, they are mostly evaluated on the simpler Sparse Matrix-Vector Multiplication (SpMV) problems. In this study, we have developed data parallel algorithms to pair up bitmap-indexed sparse matrix blocks for SpGEMM using data parallel primitives for portability. Experiments on the WebBase-1M dataset with more than one million rows and columns and three million non-zero values have shown that our technique on squaring the large-scale sparse matrix using a 2013 GTX Titan GPU can complete in about 300 milliseconds. The runtime is 2.4X faster than CUSP and 3.5X faster than bhSPARSE, the two leading open source SpGEMM packages. Furthermore, our bitmap-indexed sparse matrix blocks can be efficiently converted to regular small dense matrices and subsequently utilize new hardware accelerations, such as tensor cores inside Nvidia Volta GPUs and Google Tensor Processing Units (TPUs), for more efficient implementations.

引用

页数：8

共 48 条

[1]

Anh P. N. Q., 2016, P ACM ICS 16

[2]

[Anonymous], STRUCTURED PARALLEL

[3] An Efficient Two-Dimensional Blocking Strategy for Sparse Matrix-Vector Multiplication on GPUs [J].

Ashari, Arash ;

Sedaghati, Naser ;

Eisenlohr, John ;

Sadayappan, P. .

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, :273-282

[4] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications [J].

Ashari, Arash ;

Sedaghati, Naser ;

Eisenlohr, John ;

Parthasarathy, Srinivasan ;

Sadayappan, P. .

SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, :781-792

[5] SlimSell: A Vectorizable Graph Representation for Breadth-First Search [J].

Besta, Maciej ;

Marending, Florian ;

Solomonik, Edgar ;

Hoefler, Torsten .

2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, :32-41

[6]

Brisaboa NR, 2009, LECT NOTES COMPUT SC, V5721, P18, DOI 10.1007/978-3-642-03784-9_3

[7] PARALLEL SPARSE MATRIX-MATRIX MULTIPLICATION AND INDEXING: IMPLEMENTATION AND EXPERIMENTS [J].

Buluc, Aydin ;

Gilbert, John R. .

SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04) :C170-C191

[8]

Che S., 2014, P IEEE HPEC 14

[9] Programming GPGPU Graph Applications with Linear Algebra Building Blocks [J].

Che, Shuai ;

Beckmann, Bradford M. ;

Reinhardt, Steven K. .

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2017, 45 (03) :657-679

[10] Optimizing Sparse Matrix-Matrix Multiplication for the GPU [J].

Dalton, Steven ;

Olson, Luke ;

Bell, Nathan .

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2015, 41 (04)

← 1 2 3 4 5 →