Time and energy modeling of high-performance Level-3 BLAS on x86 architectures

被引：3

作者：

Alonso, Pedro ^{[2
]}

Catalan, Sandra ^{[1
]}

Igual, Francisco D. ^{[3
]}

Mayo, Rafael ^{[1
]}

Rodriguez-Sanchez, Rafael ^{[1
]}

Quintana-Orti, Enrique S. ^{[1
]}

机构：

[1] Univ Jaume 1, Dept Ingn & Ciencia Comp, Castellon de La Plana, Spain

[2] Univ Politecn Valencia, Dept Sistemas Informat & Comp, E-46022 Valencia, Spain

[3] Univ Complutense Madrid, Dept Arquitectura Comp & Automat, E-28040 Madrid, Spain

来源：

SIMULATION MODELLING PRACTICE AND THEORY | 2015年 / 55卷

关键词：

Modeling; High performance; Energy consumption; Matrix multiplication; Linear algebra; SET;

D O I：

10.1016/j.simpat.2015.04.003

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (GEMM) and the triangular system solve with multiple right-hand sides (TRSM) on x86 architectures. Our methodology decouples the costs due to the floating-point arithmetic/data movement occurring in the higher levels of the cache hierarchy from those of packing/data transfers between the main memory and the L2/L3 cache. A careful analytical study of the data transfers, in combination with an architecture-specific calibration of the costs per operation, render then the components to assemble piece-wise models for the accurate estimation of GEMM and TRSM's performance on x86 processors. Our experimental results on an Intel Xeon E5-2620 processor confirm the accuracy of this approach, which reports relative errors for different shapes of GEMM and TRSM that are, respectively, around 1.5% and 4.5% on average for both time and energy. (C) 2015 Elsevier B.V. All rights reserved.

引用

页码：77 / 94

页数：18

共 50 条

[1] High-performance implementation of the level-3 BLAS
Goto, Kazushige
Van De Geijn, Robert
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 35 (01): : 1 - 14
[2] FT-BLAS: A Fault Tolerant High Performance BLAS Implementation on x86 CPUs
Zhai, Yujia
Giem, Elisabeth
Zhao, Kai
Liu, Jinyang
Huang, Jiajun
Wong, Bryan M.
Shelton, Christian R.
Chen, Zizhong
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (12) : 3207 - 3223
[3] Zen: An Energy-Efficient High-Performance x86 Core
Singh, Teja
Schaefer, Alex
Rangarajan, Sundar
John, Deepesh
Henrion, Carson
Schreiber, Russell
Rodriguez, Miguel
Kosonocky, Stephen
Naffziger, Samuel
Novak, Amy
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (01) : 102 - 114
[4] Modelling the Performance of the Gaussian Chemistry Code on x86 Architectures
Antony, Joseph
Risch, Mike J.
Rendell, Alistair P.
MODELING, SIMULATION AND OPTIMIZATION OF COMPLEX PROCESSES, 2008, : 49 - +
[5] Zen: A Next-Generation High-Performance x86 Core
Singh, Teja
Rangarajan, Sundar
John, Deepesh
Henrion, Carson
Southard, Shane
McIntyre, Hugh
Novak, Amy
Kosonocky, Stephen
Jotwani, Ravi
Schaefer, Alex
Chang, Edward
Bell, Joshua
Co, Michael
2017 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC), 2017, : 52 - 52
[6] Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs
Li, Zhihao
Jia, Haipeng
Zhang, Yunquan
Chen, Tun
Yuan, Liang
Vuduc, Richard
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1925 - 1941
[7] A SET OF HIGH-PERFORMANCE LEVEL-3 BLAS STRUCTURED AND TUNED FOR THE IBM 3090-VF AND IMPLEMENTED IN FORTRAN-77
LING, P
JOURNAL OF SUPERCOMPUTING, 1993, 7 (03): : 323 - 355
[8] A high-performance implementation of atomistic spin dynamics simulations on x86 CPUs
Chen, Hongwei
Zhai, Yujia
Turner, Joshua J.
Feiguin, Adrian
COMPUTER PHYSICS COMMUNICATIONS, 2023, 291
[9] Advanced processes and architectures boost x86 CPU performance levels
Bursky, D
ELECTRONIC DESIGN, 1998, 46 (26) : 69 - +
[10] Performance benchmark of LHCb code on state-of-the-art x86 architectures
Perez, D. H. Campora
Neufeld, N.
Schwemmer, R.
21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664

← 1 2 3 4 5 →