Time and energy modeling of high-performance Level-3 BLAS on x86 architectures

被引:3
|
作者
Alonso, Pedro [2 ]
Catalan, Sandra [1 ]
Igual, Francisco D. [3 ]
Mayo, Rafael [1 ]
Rodriguez-Sanchez, Rafael [1 ]
Quintana-Orti, Enrique S. [1 ]
机构
[1] Univ Jaume 1, Dept Ingn & Ciencia Comp, Castellon de La Plana, Spain
[2] Univ Politecn Valencia, Dept Sistemas Informat & Comp, E-46022 Valencia, Spain
[3] Univ Complutense Madrid, Dept Arquitectura Comp & Automat, E-28040 Madrid, Spain
关键词
Modeling; High performance; Energy consumption; Matrix multiplication; Linear algebra; SET;
D O I
10.1016/j.simpat.2015.04.003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (GEMM) and the triangular system solve with multiple right-hand sides (TRSM) on x86 architectures. Our methodology decouples the costs due to the floating-point arithmetic/data movement occurring in the higher levels of the cache hierarchy from those of packing/data transfers between the main memory and the L2/L3 cache. A careful analytical study of the data transfers, in combination with an architecture-specific calibration of the costs per operation, render then the components to assemble piece-wise models for the accurate estimation of GEMM and TRSM's performance on x86 processors. Our experimental results on an Intel Xeon E5-2620 processor confirm the accuracy of this approach, which reports relative errors for different shapes of GEMM and TRSM that are, respectively, around 1.5% and 4.5% on average for both time and energy. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:77 / 94
页数:18
相关论文
共 50 条
  • [1] High-performance implementation of the level-3 BLAS
    Goto, Kazushige
    Van De Geijn, Robert
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 35 (01): : 1 - 14
  • [2] FT-BLAS: A Fault Tolerant High Performance BLAS Implementation on x86 CPUs
    Zhai, Yujia
    Giem, Elisabeth
    Zhao, Kai
    Liu, Jinyang
    Huang, Jiajun
    Wong, Bryan M.
    Shelton, Christian R.
    Chen, Zizhong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (12) : 3207 - 3223
  • [3] Zen: An Energy-Efficient High-Performance x86 Core
    Singh, Teja
    Schaefer, Alex
    Rangarajan, Sundar
    John, Deepesh
    Henrion, Carson
    Schreiber, Russell
    Rodriguez, Miguel
    Kosonocky, Stephen
    Naffziger, Samuel
    Novak, Amy
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (01) : 102 - 114
  • [4] Modelling the Performance of the Gaussian Chemistry Code on x86 Architectures
    Antony, Joseph
    Risch, Mike J.
    Rendell, Alistair P.
    MODELING, SIMULATION AND OPTIMIZATION OF COMPLEX PROCESSES, 2008, : 49 - +
  • [5] Zen: A Next-Generation High-Performance x86 Core
    Singh, Teja
    Rangarajan, Sundar
    John, Deepesh
    Henrion, Carson
    Southard, Shane
    McIntyre, Hugh
    Novak, Amy
    Kosonocky, Stephen
    Jotwani, Ravi
    Schaefer, Alex
    Chang, Edward
    Bell, Joshua
    Co, Michael
    2017 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC), 2017, : 52 - 52
  • [6] Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs
    Li, Zhihao
    Jia, Haipeng
    Zhang, Yunquan
    Chen, Tun
    Yuan, Liang
    Vuduc, Richard
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1925 - 1941
  • [7] A SET OF HIGH-PERFORMANCE LEVEL-3 BLAS STRUCTURED AND TUNED FOR THE IBM 3090-VF AND IMPLEMENTED IN FORTRAN-77
    LING, P
    JOURNAL OF SUPERCOMPUTING, 1993, 7 (03): : 323 - 355
  • [8] A high-performance implementation of atomistic spin dynamics simulations on x86 CPUs
    Chen, Hongwei
    Zhai, Yujia
    Turner, Joshua J.
    Feiguin, Adrian
    COMPUTER PHYSICS COMMUNICATIONS, 2023, 291
  • [9] Advanced processes and architectures boost x86 CPU performance levels
    Bursky, D
    ELECTRONIC DESIGN, 1998, 46 (26) : 69 - +
  • [10] Performance benchmark of LHCb code on state-of-the-art x86 architectures
    Perez, D. H. Campora
    Neufeld, N.
    Schwemmer, R.
    21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664