Modeling;
High performance;
Energy consumption;
Matrix multiplication;
Linear algebra;
SET;
D O I:
10.1016/j.simpat.2015.04.003
中图分类号:
TP39 [计算机的应用];
学科分类号:
081203 ;
0835 ;
摘要:
We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (GEMM) and the triangular system solve with multiple right-hand sides (TRSM) on x86 architectures. Our methodology decouples the costs due to the floating-point arithmetic/data movement occurring in the higher levels of the cache hierarchy from those of packing/data transfers between the main memory and the L2/L3 cache. A careful analytical study of the data transfers, in combination with an architecture-specific calibration of the costs per operation, render then the components to assemble piece-wise models for the accurate estimation of GEMM and TRSM's performance on x86 processors. Our experimental results on an Intel Xeon E5-2620 processor confirm the accuracy of this approach, which reports relative errors for different shapes of GEMM and TRSM that are, respectively, around 1.5% and 4.5% on average for both time and energy. (C) 2015 Elsevier B.V. All rights reserved.