Improving the memory-system performance of sparse-matrix vector multiplication

被引:86
作者
Toledo, S [1 ]
机构
[1] Xerox Corp, Palo Alto Res Ctr, Palo Alto, CA 94304 USA
关键词
D O I
10.1147/rd.416.0711
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse-matrix vector multiplication is an important kernel that often runs inefficiently on superscalar RISC processors. This paper describes techniques that increase instruction-level parallelism and improve performance. The techniques include reordering to reduce cache misses (originally due to Das et al.), blocking to reduce load instructions, and prefetching to prevent multiple load-store units from starring simultaneously. The techniques improve performance from about 40 MFLOPS (on a well-ordered matrix) to more than 100 MFLOPS on a 266-MFLOPS machine. The techniques are applicable to other superscalar RISC processors as well, and have improved performance on a Sun UltraSPARC(TM) I workstation, for example.
引用
收藏
页码:711 / 725
页数:15
相关论文
共 22 条
[1]  
Agarwal R. C., 1992, Proceedings. Supercomputing '92. (Cat. No.92CH3216-9), P32, DOI 10.1109/SUPERC.1992.236712
[2]   HIGH-PERFORMANCE PARALLEL IMPLEMENTATIONS OF THE NAS KERNEL BENCHMARKS ON THE IBM SP2 [J].
AGARWAL, RC ;
ALPERN, B ;
CARTER, L ;
GUSTAVSON, FG ;
KLEPACKI, DJ ;
LAWRENCE, R ;
ZUBAIR, M .
IBM SYSTEMS JOURNAL, 1995, 34 (02) :263-272
[3]   IMPROVING PERFORMANCE OF LINEAR ALGEBRA ALGORITHMS FOR DENSE MATRICES, USING ALGORITHMIC PREFETCH [J].
AGARWAL, RC ;
GUSTAVSON, FG ;
ZUBAIR, M .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1994, 38 (03) :265-275
[4]  
[Anonymous], 1991, P ACM IEEE C SUP SUP
[5]  
BALAY S, 1996, PETSC 2 0 USERS MANU
[6]  
Barret R., 1993, TEMPLATES SOLUTION L
[7]  
BURGESS DA, 1995, 9506 OXF U COMP LAB
[8]  
Cuthill E, 1969, P 1969 24 NAT C, P157, DOI [DOI 10.1145/800195.805928, 10.1145/800195.805928]
[9]   DESIGN AND IMPLEMENTATION OF A PARALLEL UNSTRUCTURED EULER SOLVER USING SOFTWARE PRIMITIVES [J].
DAS, R ;
MAVRIPLIS, DJ ;
SALTZ, J ;
GUPTA, S ;
PONNUSAMY, R .
AIAA JOURNAL, 1994, 32 (03) :489-496
[10]   SUPERSCALAR INSTRUCTION EXECUTION IN THE 21164-ALPHA MICROPROCESSOR [J].
EDMONDSON, JH ;
RUBINFELD, P ;
PRESTON, R ;
RAJAGOPALAN, V .
IEEE MICRO, 1995, 15 (02) :33-43