Pragmatic integrated scheduling for clustered VLIW architectures

被引:8
作者
Nagpal, Rahul [1 ]
Srikant, Y. N. [1 ]
机构
[1] Indian Inst Sci, Dept CSA, Bangalore 560012, Karnataka, India
关键词
scheduling; clustered VLIW architectures; cluster scheduling;
D O I
10.1002/spe.826
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Scheduling for clustered architectures involves spatial concerns (where to schedule) as well as temporal concerns (when to schedule). Various clustered VLIW configurations, connectivity types, and inter-cluster communication models present different performance trade-offs to a scheduler. The scheduler is responsible for resolving the conflicting requirements of exploiting the parallelism offered by the hardware and limiting the communication among clusters to achieve better performance. In this paper, we describe our experience with developing a pragmatic scheme and also a generic graph-matching-based framework for cluster scheduling based on a generic and realistic clustered machine model. The proposed scheme effectively utilizes the exact knowledge of available communication slots, functional units, and load on different clusters as well as future resource and communication requirements known only at schedule time. The proposed graph-matching-based framework for cluster scheduling resolves the phase-ordering and fixed-ordering problem associated with earlier schemes for. scheduling clustered VLIW architectures. The experimental evaluation in the context of a state-of-art commercial clustered architecture (using real-world benchmark programs) reveals a significant performance improvement over the earlier proposals, which were mostly evaluated using compiled simulation of hypothetical clustered architectures. Our results clearly highlight the importance of considering the peculiarities of commercial clustered architectures and the hard-nosed performance measurement. Copyright (c) 2007 John Wiley & Sons, Ltd.
引用
收藏
页码:227 / 257
页数:31
相关论文
共 27 条
[1]  
CAPITANIO A, 1992, P 25 ANN INT S MICR, V25, P292
[2]   THE PRIORITY-BASED COLORING APPROACH TO REGISTER ALLOCATION [J].
CHOW, FC ;
HENNESSY, JL .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1990, 12 (04) :501-536
[3]  
Faraboschi P, 2000, PROCEEDING OF THE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, P203, DOI [10.1145/342001.339682, 10.1109/ISCA.2000.854391]
[4]  
FARABOSCHI P, 1998, CLUSTERED INSTRUCTIO
[5]   The multicluster architecture: Reducing cycle time through partitioning [J].
Farkas, KI ;
Chow, P ;
Jouppi, NP ;
Vranesic, Z .
THIRTIETH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 1997, :149-159
[6]  
FISHER JA, 1998, 25 YEARS INT S COMP, P263
[7]   The TigerSHARC DSP architecture [J].
Fridman, J ;
Greenfield, Z .
IEEE MICRO, 2000, 20 (01) :66-76
[8]   The future of wires [J].
Ho, R ;
Mai, KW ;
Horowitz, MA .
PROCEEDINGS OF THE IEEE, 2001, 89 (04) :490-504
[9]  
Johnson W, 1991, SUPERSCALAR MICROPRO
[10]  
Kailas K., 2001, P 7 INT S HIGH PERF