A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels

被引:30
作者
Valsalam, V [1 ]
Skjellum, A [1 ]
机构
[1] Mississippi State Univ, Dept Comp Sci, High Performance Comp Lab, Mississippi State, MS 39762 USA
关键词
matrix multiplication; hierarchical matrix storage; Morton order; polyalgorithms; Strassen's algorithm; kernel interface;
D O I
10.1002/cpe.630
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Despite extensive research, optimal performance has not easily been available previously for matrix multiplication (especially for large matrices) on most architectures because of the lack of a structured approach and the limitations imposed by matrix storage formats. A simple but effective framework is presented here that lays the foundation for building high-performance matrix-multiplication codes in a structured, portable and efficient manner. The resulting codes are validated on three different representative RISC and CISC architectures on which they significantly outperform highly optimized libraries such as ATLAS and other competing methodologies reported in the literature. The main component of the proposed approach is a hierarchical storage format that efficiently generalizes the applicability of the memory hierarchy friendly Morton ordering to arbitrary-sized matrices. The storage format supports polyalgorithms, which are shown here to be essential for obtaining the best possible performance for a range of problem sizes. Several algorithmic advances are made in this paper, including an oscillating iterative algorithm for matrix multiplication and a variable recursion cutoff criterion for Strassen's algorithm. The authors expose the need to standardize linear algebra kernel interfaces, distinct from the BLAS, for writing portable high-performance code. These kernel routines operate on small blocks that fit in the L1 cache. The performance advantages of the proposed framework can be effectively delivered to new and existing applications through the use of object-oriented or compiler-based approaches. Copyright (C) 2002 John Wiley Sons, Ltd.
引用
收藏
页码:805 / 839
页数:35
相关论文
共 57 条
[41]   FINDING NEIGHBORS OF EQUAL SIZE IN LINEAR QUADTREES AND OCTREES IN CONSTANT TIME [J].
SCHRACK, G .
CVGIP-IMAGE UNDERSTANDING, 1992, 55 (03) :221-230
[42]  
SCHREIBER R, 1990, CS90108 U TENN
[43]  
SIEK J, 1998, WORKSH PAR OBJ OR SC
[44]  
SIEK JG, 1998, P INT S COMP OBJ OR
[45]  
SKJELLUM A, 1997, DRAFT DOCUMENT BLAS
[46]  
STOCCO L, 1995, P IEEE PAC RIM C COM
[47]   GAUSSIAN ELIMINATION IS NOT OPTIMAL [J].
STRASSEN, V .
NUMERISCHE MATHEMATIK, 1969, 13 (04) :354-&
[48]  
THOTTETHODI M, 1998, P SUP 98 NOV
[49]  
VALSALAM V, 2000, MSSUCOEERC0005
[50]  
VALSALAM V, 2001, LINEAR ALGEBRA BASED