Region-based hierarchical operation partitioning for multicluster processors

被引:15
作者
Chu, M [1 ]
Fan, K [1 ]
Mahlke, S [1 ]
机构
[1] Univ Michigan, Adv Comp Architecture Lab, Ann Arbor, MI 48109 USA
关键词
algorithms; experimentation; performance; clustering; instruction; level parallelism; instruction scheduling; multicluster processor; operation partitioning; region based compilation;
D O I
10.1145/780822.781165
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Clustered architectures are a solution to the bottleneck of centralized register files in superscalar and VLIW processors. The main challenge associated with clustered architectures is compiler support to effectively partition operations across the available resources on each cluster. In this work, we present a novel technique for clustering operations based on graph partitioning methods. Our approach incorporates new methods of assigning weights to nodes and edges within the dataflow graph to guide the partitioner. Nodes are assigned weights to reflect their resource usage within a cluster, while a slack distribution method intelligently assigns weights to edges to reflect the cost of inserting moves across clusters. A multilevel graph partitioning algorithm, which globally divides a dataflow graph into multiple parts in a hierarchical manner, uses these weights-to efficiently generate estimates for the quality of partitions. We found that our algorithm was able to achieve an average of 20% improvement in DSP kernels and 5% improvement in SPECint2000 for a four-cluster architecture.
引用
收藏
页码:300 / 311
页数:12
相关论文
共 25 条
[1]  
ALETA A, 2001, P 34 ANN INT S MICR
[2]  
Aleta A., 2002, P 2002 INT C PAR ARC
[3]  
[Anonymous], 1983, PROC 10 ANN INT S CO
[4]  
Capitanio A., 1992, P 25 ANN INT S MICR, P103
[5]  
CODINA J, 2001, P 34 ANN INT S MICR
[6]  
DESOLI G, 1998, HPL9813
[7]  
Ellis J.R., 1985, BULLDOG COMPILER VLI
[8]  
FARABOSCHI P, 1998, HPL98204
[9]  
FARKAS K, 1997, P 30 ANN INT S MICR
[10]  
Fields B., 2002, P 29 ANN INT S COMP