Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems

被引:29
作者
Agullo, Emmanuel [1 ,5 ]
Buttari, Alfredo [2 ,6 ]
Guermouche, Abdou [3 ,7 ]
Lopez, Florent [4 ,6 ,8 ]
机构
[1] Inria LaBRI, Talence, France
[2] CNRS IRIT, Toulouse, France
[3] Univ Bordeaux, Talence, France
[4] UPS IRIT, Toulouse, France
[5] Ctr Rech Inria Bordeaux Sud Ouest, 200 Ave Vieille Tour, F-33405 Talence, France
[6] ENSEEIHT IRIT, 2 Rue Camichel, F-31071 Toulouse, France
[7] LaBRI, 351 Cours Liberat, F-33405 Talence, France
[8] STFC Rutherford Appleton Lab, Harwell Campus, Didcot OX11 0QX, Oxon, England
来源
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE | 2016年 / 43卷 / 02期
关键词
Algorithms; Performance; Sparse direct solvers; multicores; runtime systems; communication-avoiding; memory-aware; QR FACTORIZATION;
D O I
10.1145/2898348
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This article evaluates the usability and effectiveness of runtime systems based on the Sequential Task Flow model for complex applications, namely, sparse matrix multifrontal factorizations that feature extremely irregular workloads, with tasks of different granularities and characteristics and with a variable memory consumption. Most importantly, it shows how this parallel programming model eases the development of complex features that benefit the performance of sparse, direct solvers as well as their memory consumption. We illustrate our discussion with the multifrontal QR factorization running on top of the StarPU runtime system.
引用
收藏
页数:22
相关论文
共 39 条
[1]  
Agullo E, 2013, LECT NOTES COMPUT SC, V8097, P521, DOI 10.1007/978-3-642-40047-6_53
[2]   Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects [J].
Agullo, Emmanuel ;
Demmel, Jim ;
Dongarra, Jack ;
Hadri, Bilel ;
Kurzak, Jakub ;
Langou, Julien ;
Ltaief, Hatem ;
Luszczek, Piotr ;
Tomov, Stanimire .
SCIDAC 2009: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2009, 180
[3]  
Agullo Emmanuel, 2011, ABS11025328 CORR
[4]  
Allen Randy, 2002, OPTIMIZING COMPILERS
[5]  
Amestoy PR, 1996, NUMER LINEAR ALGEBR, V3, P275
[6]   Thread scheduling for multiprogrammed multiprocessors [J].
Arora, NS ;
Blumofe, RD ;
Plaxton, CG .
THEORY OF COMPUTING SYSTEMS, 2001, 34 (02) :115-144
[7]   StarPU: a unified platform for task scheduling on heterogeneous multicore architectures [J].
Augonnet, Cedric ;
Thibault, Samuel ;
Namyst, Raymond ;
Wacrenier, Pierre-Andre .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (02) :187-198
[8]   Parallelizing dense and banded linear algebra libraries using SMPSs [J].
Badia, Rosa M. ;
Herrero, Jose R. ;
Labarta, Jesus ;
Perez, Josep M. ;
Quintana-Orti, Enrique S. ;
Quintana-Orti, Gregorio .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2009, 21 (18) :2438-2456
[9]  
Bosilca G., 2011, 2011 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, P1151, DOI 10.1109/IPDPS.2011.281
[10]  
Bouwmeester Henricus, 2011, P 2011 INT C HIGH PE, P7