Optimizing Sparse Matrix-Matrix Multiplication for the GPU

被引：81

作者：

Dalton, Steven ^{[1
]}

Olson, Luke ^{[1
]}

Bell, Nathan ^{[2
]}

机构：

[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA

[2] Google, Mountain View, CA 94043 USA

来源：

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE | 2015年 / 41卷 / 04期

关键词：

Algorithms; Performance; Parallel; sparse; GPU; matrix-matrix; IMPLEMENTATION; PARALLELISM; PERFORMANCE;

D O I：

10.1145/2699470

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Sparse matrix-matrix multiplication (SpGEMM) is a key operation in numerous areas from information to the physical sciences. Implementing SpGEMM efficiently on throughput-oriented processors, such as the graphics processing unit (GPU), requires the programmer to expose substantial fine-grained parallelism while conserving the limited off-chip memory bandwidth. Balancing these concerns, we decompose the SpGEMM operation into three highly parallel phases: expansion, sorting, and contraction, and introduce a set of complementary bandwidth-saving performance optimizations. Our implementation is fully general and our optimization strategy adaptively processes the SpGEMM workload row-wise to substantially improve performance by decreasing the work complexity and utilizing the memory hierarchy more effectively.

引用

页数：20

共 28 条

[1]

[Anonymous], CS201003 U VIRG

[2]

[Anonymous], 2010, TESLA C2050 C2070 GP

[3]

Balay S, 1997, MODERN SOFTWARE TOOLS FOR SCIENTIFIC COMPUTING, P163

[4]

Bank R.E., 1993, Advances in Computa- tional Mathematics, V1, P127

[5]

Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1

[6]

Bell N., 2009, CUSP : Generic parallel algorithms for sparse matrix and graph computations

[7] EXPOSING FINE-GRAINED PARALLELISM IN ALGEBRAIC MULTIGRID METHODS [J].

Bell, Nathan ;

Dalton, Steven ;

Olson, Luke N. .

SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04) :C123-C152

[8]

Buluç A, 2008, 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, P1876

[9] PARALLEL SPARSE MATRIX-MATRIX MULTIPLICATION AND INDEXING: IMPLEMENTATION AND EXPERIMENTS [J].

Buluc, Aydin ;

Gilbert, John R. .

SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04) :C170-C191

[10] The Combinatorial BLAS: design, implementation, and applications [J].

Buluc, Aydin ;

Gilbert, John R. .

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2011, 25 (04) :496-509

← 1 2 3 →