Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems

被引:2
|
作者
Boratto, Murilo [1 ]
Alonso, Pedro [2 ]
Gimenez, Domingo [3 ]
Lastovetsky, Alexey [4 ]
机构
[1] Univ Estado Bahia, Nucleo Arquitetura Comp & Sistemas Operacionais, Salvador, BA, Brazil
[2] Univ Politecn Valencia, Dept Sistemas Informat & Comp, Valencia, Spain
[3] Univ Murcia, Dept Sistemas Informat, Murcia, Spain
[4] Univ Coll Dublin, Sch Comp Sci, Heterogeneous Comp Lab, Dublin, Ireland
来源
JOURNAL OF SUPERCOMPUTING | 2017年 / 73卷 / 01期
关键词
Automatic tuning; Matrix polynomials; Performance; Multicore; Multi-GPU;
D O I
10.1007/s11227-016-1694-y
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic tuning methodologies have been used in the design of routines in recent years. The goal of these methodologies is to develop routines which automatically adapt to the conditions of the underlying computational system so that efficient executions are obtained independently of the end-user experience. This paper aims to explore programming routines that can automatically be adapted to the computational system conditions thanks to these automatic tuning methodologies. In particular, we have worked on the evaluation of matrix polynomials on multicore and multi-GPU systems as a target application. This application is very useful for the computation of matrix functions like the sine or cosine but, at the same time, the application is very time consuming since the basic computational kernel, which is the matrix multiplication, is carried out many times. The use of all available resources within a node in an easy and efficient way is crucial for the end user.
引用
收藏
页码:227 / 239
页数:13
相关论文
共 50 条
  • [31] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
    Jiang, Hai
    Chen, Yi
    Qiao, Zhi
    Weng, Tien-Hsiung
    Li, Kuan-Ching
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (01): : 369 - 383
  • [32] MULTI-GPU DGEMM AND HIGH PERFORMANCE LINPACK ON HIGHLY ENERGY-EFFICIENT CLUSTERS
    Rohr, David
    Bach, Matthias
    Kretz, Matthias
    Lindenstruth, Volker
    IEEE MICRO, 2011, 31 (05) : 18 - 26
  • [33] Locality-aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems
    Belayneh, Leul
    Ye, Haojie
    Chen, Kuan-Yu
    Blaauw, David
    Mudge, Trevor
    Dreslinski, Ronald
    Talati, Nishil
    PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 304 - 316
  • [34] XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server
    Gautier, Thierry
    Lima, Joao V. F.
    2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020), 2020, : 1 - 8
  • [35] REC: Enhancing fine-grained cache coherence protocol in multi-GPU systems
    Ko, Gun
    Lee, Jiwon
    Kal, Hongju
    Lee, Hyunwuk
    Ro, Won Woo
    JOURNAL OF SYSTEMS ARCHITECTURE, 2025, 160
  • [36] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
    Hai Jiang
    Yi Chen
    Zhi Qiao
    Tien-Hsiung Weng
    Kuan-Ching Li
    Cluster Computing, 2015, 18 : 369 - 383
  • [37] Exploring Fine-Grained Task-based Execution on Multi-GPU Systems
    Chen, Long
    Villa, Oreste
    Gao, Guang R.
    2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 386 - 394
  • [38] CuSNMF: A Sparse Non-negative Matrix Factorization Approach for Large-Scale Collaborative Filtering Recommender Systems on Multi-GPU
    Li, Hao
    Li, Kenli
    Peng, Jiwu
    Li, Keqin
    2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 1144 - 1151
  • [39] P-Cloth: Interactive Complex Cloth Simulation on Multi-GPU Systems using Dynamic Matrix Assembly and Pipelined Implicit Integrators
    Li, Cheng
    Tang, Min
    Tong, Ruofeng
    Cai, Ming
    Zhao, Jieyi
    Manocha, Dinesh
    ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
  • [40] Improving the Performance of Cardiac Simulations in a Multi-GPU Architecture Using a Coalesced Data and Kernel Scheme
    Cordeiro, Raphael Pereira
    Oliveira, Rafael Sachetto
    dos Santos, Rodrigo Weber
    Lobosco, Marcelo
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2016, 2016, 10048 : 546 - 553