Time and energy modeling of high-performance Level-3 BLAS on x86 architectures

被引:3
作者
Alonso, Pedro [2 ]
Catalan, Sandra [1 ]
Igual, Francisco D. [3 ]
Mayo, Rafael [1 ]
Rodriguez-Sanchez, Rafael [1 ]
Quintana-Orti, Enrique S. [1 ]
机构
[1] Univ Jaume 1, Dept Ingn & Ciencia Comp, Castellon de La Plana, Spain
[2] Univ Politecn Valencia, Dept Sistemas Informat & Comp, E-46022 Valencia, Spain
[3] Univ Complutense Madrid, Dept Arquitectura Comp & Automat, E-28040 Madrid, Spain
关键词
Modeling; High performance; Energy consumption; Matrix multiplication; Linear algebra; SET;
D O I
10.1016/j.simpat.2015.04.003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (GEMM) and the triangular system solve with multiple right-hand sides (TRSM) on x86 architectures. Our methodology decouples the costs due to the floating-point arithmetic/data movement occurring in the higher levels of the cache hierarchy from those of packing/data transfers between the main memory and the L2/L3 cache. A careful analytical study of the data transfers, in combination with an architecture-specific calibration of the costs per operation, render then the components to assemble piece-wise models for the accurate estimation of GEMM and TRSM's performance on x86 processors. Our experimental results on an Intel Xeon E5-2620 processor confirm the accuracy of this approach, which reports relative errors for different shapes of GEMM and TRSM that are, respectively, around 1.5% and 4.5% on average for both time and energy. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:77 / 94
页数:18
相关论文
empty
未找到相关数据