Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems

被引：2

作者：

Boratto, Murilo ^{[1
]}

Alonso, Pedro ^{[2
]}

Gimenez, Domingo ^{[3
]}

Lastovetsky, Alexey ^{[4
]}

机构：

[1] Univ Estado Bahia, Nucleo Arquitetura Comp & Sistemas Operacionais, Salvador, BA, Brazil

[2] Univ Politecn Valencia, Dept Sistemas Informat & Comp, Valencia, Spain

[3] Univ Murcia, Dept Sistemas Informat, Murcia, Spain

[4] Univ Coll Dublin, Sch Comp Sci, Heterogeneous Comp Lab, Dublin, Ireland

来源：

JOURNAL OF SUPERCOMPUTING | 2017年 / 73卷 / 01期

关键词：

Automatic tuning; Matrix polynomials; Performance; Multicore; Multi-GPU;

D O I：

10.1007/s11227-016-1694-y

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic tuning methodologies have been used in the design of routines in recent years. The goal of these methodologies is to develop routines which automatically adapt to the conditions of the underlying computational system so that efficient executions are obtained independently of the end-user experience. This paper aims to explore programming routines that can automatically be adapted to the computational system conditions thanks to these automatic tuning methodologies. In particular, we have worked on the evaluation of matrix polynomials on multicore and multi-GPU systems as a target application. This application is very useful for the computation of matrix functions like the sine or cosine but, at the same time, the application is very time consuming since the basic computational kernel, which is the matrix multiplication, is carried out many times. The use of all available resources within a node in an easy and efficient way is crucial for the end user.

引用

页码：227 / 239

页数：13

共 50 条

[31] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
Jiang, Hai
Chen, Yi
Qiao, Zhi
Weng, Tien-Hsiung
Li, Kuan-Ching
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (01): : 369 - 383
[32] MULTI-GPU DGEMM AND HIGH PERFORMANCE LINPACK ON HIGHLY ENERGY-EFFICIENT CLUSTERS
Rohr, David
Bach, Matthias
Kretz, Matthias
Lindenstruth, Volker
IEEE MICRO, 2011, 31 (05) : 18 - 26
[33] Locality-aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems
Belayneh, Leul
Ye, Haojie
Chen, Kuan-Yu
Blaauw, David
Mudge, Trevor
Dreslinski, Ronald
Talati, Nishil
PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 304 - 316
[34] XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server
Gautier, Thierry
Lima, Joao V. F.
2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020), 2020, : 1 - 8
[35] REC: Enhancing fine-grained cache coherence protocol in multi-GPU systems
Ko, Gun
Lee, Jiwon
Kal, Hongju
Lee, Hyunwuk
Ro, Won Woo
JOURNAL OF SYSTEMS ARCHITECTURE, 2025, 160
[36] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
Hai Jiang
Yi Chen
Zhi Qiao
Tien-Hsiung Weng
Kuan-Ching Li
Cluster Computing, 2015, 18 : 369 - 383
[37] Exploring Fine-Grained Task-based Execution on Multi-GPU Systems
Chen, Long
Villa, Oreste
Gao, Guang R.
2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 386 - 394
[38] CuSNMF: A Sparse Non-negative Matrix Factorization Approach for Large-Scale Collaborative Filtering Recommender Systems on Multi-GPU
Li, Hao
Li, Kenli
Peng, Jiwu
Li, Keqin
2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 1144 - 1151
[39] P-Cloth: Interactive Complex Cloth Simulation on Multi-GPU Systems using Dynamic Matrix Assembly and Pipelined Implicit Integrators
Li, Cheng
Tang, Min
Tong, Ruofeng
Cai, Ming
Zhao, Jieyi
Manocha, Dinesh
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
[40] Improving the Performance of Cardiac Simulations in a Multi-GPU Architecture Using a Coalesced Data and Kernel Scheme
Cordeiro, Raphael Pereira
Oliveira, Rafael Sachetto
dos Santos, Rodrigo Weber
Lobosco, Marcelo
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2016, 2016, 10048 : 546 - 553

← 1 2 3 4 5 →