A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures

被引:0
|
作者
Agullo, Emmanuel
Dongarra, Jack
Nath, Rajib
Tomov, Stanimire
机构
来源
EURO-PAR 2011 PARALLEL PROCESSING, PT 2 | 2011年 / 6853卷
关键词
SOFTWARE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures using a fully empirical approach. We exhibit a few strong empirical properties that enable us to efficiently prune the search space. Our method is automatic, fast and reliable. The tuning process is indeed fully performed at install time in less than one hour and ten minutes on five out of seven platforms. We achieve an average performance varying from 97% to 100% of the optimum performance depending on the platform. This work is a basis for autotuning the PLASMA library and enabling easy performance portability across hardware systems.
引用
收藏
页码:194 / 205
页数:12
相关论文
共 37 条
  • [1] Parallel tiled QR factorization for multicore architectures
    Buttari, Alfredo
    Langou, Julien
    Kurzak, Jakub
    Dongarra, Jack
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2008, 20 (13): : 1573 - 1590
  • [2] Parallel tiled QR factorization for multicore architectures
    Buttari, Alfredo
    Langou, Julien
    Kurzak, Jakub
    Dongarra, Jack
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 639 - +
  • [3] Multifrontal QR Factorization for Multicore Architectures over Runtime Systems
    Agullo, Emmanuel
    Buttari, Alfredo
    Guermouche, Abdou
    Lopez, Florent
    EURO-PAR 2013 PARALLEL PROCESSING, 2013, 8097 : 521 - 532
  • [4] Strategies of parallelizing nested loops on the multicore architectures on the example of the WZ factorization for the dense matrices
    Bylina, Beata
    Bylina, Jaroslaw
    PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 629 - 639
  • [5] QR Factorization Using Malleable BLAS on Multicore Processors
    Castello, Adrian
    Catalan, Sandra
    Igual, Francisco D.
    Quintana-Orti, Enrique S.
    Rodriguez-Sanchez, Rafael
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2022 INTERNATIONAL WORKSHOPS, 2022, 13387 : 176 - 189
  • [6] QR FACTORIZATION OF A DENSE MATRIX ON A HYPERCUBE MULTIPROCESSOR
    CHU, E
    GEORGE, A
    SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1990, 11 (05): : 990 - 1028
  • [7] Fine Granularity Sparse QR Factorization for Multicore Based Systems
    Buttari, Alfredo
    APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT II, 2012, 7134 : 226 - 236
  • [8] Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSEC
    Aupy, Guillaume
    Faverge, Mathieu
    Robert, Yves
    Kurzak, Jakub
    Luszczek, Piotr
    Dongarra, Jack
    EURO-PAR 2013: PARALLEL PROCESSING WORKSHOPS, 2014, 8374 : 657 - 667
  • [9] THE PARALLEL TILED WZ FACTORIZATION ALGORITHM FOR MULTICORE ARCHITECTURES
    Bylina, Beata
    Bylina, Jaroslaw
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2019, 29 (02) : 407 - 419
  • [10] Optimized sparse Cholesky factorization on hybrid multicore architectures
    Tang, Meng
    Gadou, Mohamed
    Rennich, Steven
    Davis, Timothy A.
    Ranka, Sanjay
    JOURNAL OF COMPUTATIONAL SCIENCE, 2018, 26 : 246 - 253