Recent Advances in Matrix Partitioning for Parallel Computing on Heterogeneous Platforms

被引:20
作者
Beaumont, Olivier [1 ,2 ]
Becker, Brett A. [3 ]
DeFlumere, Ashley [4 ]
Eyraud-Dubois, Lionel [1 ,2 ]
Lambert, Thomas [1 ,2 ]
Lastovetsky, Alexey [3 ]
机构
[1] Univ Bordeaux, INRIA, F-33405 Bordeaux, France
[2] Univ Bordeaux, LaBRI, F-33405 Bordeaux, France
[3] Univ Coll Dublin, Dublin 4, Ireland
[4] Wellesley Coll, Wellesley, MA 02481 USA
基金
爱尔兰科学基金会;
关键词
Approximation algorithms; high performance computing; matrix partitioning; parallel matrix multiplication; heterogeneous computing; RECTANGLES;
D O I
10.1109/TPDS.2018.2853151
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The problem of partitioning dense matrices into sets of sub-matrices has received increased attention recently and is crucial when considering dense linear algebra and kernels with similar communication patterns on heterogeneous platforms. The problem of load balancing and minimizing communication is traditionally reducible to an optimization problem that involves partitioning a square into rectangles. This problem has been proven to be NP-Complete for an arbitrary number of partitions. In this paper, we present recent approaches that relax the restriction that all partitions be rectangles. The first approach uses an original mathematical technique to find the exact optimal partitioning. Due to the complexity of the technique, it has been developed for a small number of partitions only. However, even at a small scale, the optimal partitions found by this approach are often non-rectangular and sometimes non-intuitive. The second approach is the study of approximate partitioning methods utilizing recursive partitioning algorithms. In particular we use the work on optimal partitioning to improve pre-existing algorithms. In this paper we discuss the different perspectives this approach opens and present two algorithms, SNRPP which is a root 3/2 approximation, and SNRPP which is a 2/root 3 approximation. While sub-optimal, the NRRP approach works for an arbitrary number of partitions. We use the first exact approach to analyse how close to the known optimal solutions the NRRP algorithm is for small numbers of partitions.
引用
收藏
页码:218 / 229
页数:12
相关论文
共 26 条
[1]   StarPU: a unified platform for task scheduling on heterogeneous multicore architectures [J].
Augonnet, Cedric ;
Thibault, Samuel ;
Namyst, Raymond ;
Wacrenier, Pierre-Andre .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (02) :187-198
[2]   MINIMIZING COMMUNICATION IN NUMERICAL LINEAR ALGEBRA [J].
Ballard, Grey ;
Demmel, James ;
Holtz, Olga ;
Schwartz, Oded .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2011, 32 (03) :866-901
[3]   Partitioning a square into rectangles: NP-completeness and approximation algorithms [J].
Beaumont, O ;
Boudet, V ;
Rastello, F ;
Robert, Y .
ALGORITHMICA, 2002, 34 (03) :217-239
[4]   Static LU decomposition on heterogeneous platforms [J].
Beaumont, O ;
Legrand, A ;
Rastello, F ;
Robert, Y .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2001, 15 (03) :310-323
[5]   A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers) [J].
Beaumont, O ;
Boudet, V ;
Petitet, A ;
Rastello, F ;
Robert, Y .
IEEE TRANSACTIONS ON COMPUTERS, 2001, 50 (10) :1052-1070
[6]   A New Approximation Algorithm for Matrix Partitioning in Presence of Strongly Heterogeneous Processors [J].
Beaumont, Olivier ;
Eyraud-Dubois, Lionel ;
Lambert, Thomas .
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, :474-483
[7]   Comparison of Static and Dynamic Resource Allocation Strategies for Matrix Multiplication [J].
Beaumont, Olivier ;
Eyraud-Dubois, Lionel ;
Guermouche, Abdou ;
Lambert, Thomas .
2015 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2015, :170-177
[8]  
Becker B. A., 2010, THESIS
[9]  
Becker BA, 2007, ISPDC 2007: SIXTH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING, PROCEEDINGS, P285
[10]  
Becker BA, 2006, 2006 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, VOLS 1 AND 2, P593