Dynamic instruction scheduling in a trace-based multi-threaded architecture

被引:0
作者
Rounce, Peter A. [1 ]
De Souza, Alberto F. [2 ]
机构
[1] UCL, Dept Comp Sci, London WC1E 6BT, England
[2] Univ Fed Espirito Santo, Dept Informat, BR-29075910 Vitoria, ES, Brazil
关键词
simultaneous multi-threading; dynamic instruction scheduling; wide issue architectures; VLIW;
D O I
10.1007/s10766-007-0062-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Simulation results are presented using the hardware-implemented, trace-based dynamic instruction scheduler of our single process DTSVLIW architecture to schedule instructions from several processes into multiple streams of VLIW instructions for execution by a wide-issue, simultaneous multi-threading (SMT) execution engine. The scheduling process involves single instruction execution of each process, dynamically scheduling executed instructions into blocks of VLIW instructions cached for subsequent SMT execution: SMT provides a mechanism to reduce the impact of horizontal and vertical waste, and variable memory latencies, seen in the DTSVLIW. Preliminary experiments explore this extended model. Results achieve PE utilization of up to 87% on a 4-thread, 1-scalar, 8 PE design, with speed-ups of up to 6.3 that of a single processor. Noticeably it only needs a single scalar process to be scheduled at any time, with main memory fetches being 1-4% that of a single processor.
引用
收藏
页码:184 / 205
页数:22
相关论文
共 19 条
[1]   Dynamically scheduling VLIW instructions [J].
de Souza, AF ;
Rounce, P .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2000, 60 (12) :1480-1511
[2]  
de Souza AF, 1998, LECT NOTES COMPUT SC, V1401, P993, DOI 10.1007/BFb0037255
[3]  
DESOUZA AF, 2001, P 13 S COMP ARCH HIG, P98
[4]  
DESOUZA AF, 1999, THESIS U LONDON
[5]   Simultaneous multithreading: A platform for next-generation processors [J].
Eggers, SJ ;
Emer, JS ;
Levy, HM ;
Lo, JL ;
Stamm, RL ;
Tullsen, DM .
IEEE MICRO, 1997, 17 (05) :12-19
[6]  
FISHER JA, 1984, COMPUTER, V17, P45, DOI 10.1109/MC.1984.1659185
[7]   THE SUPERBLOCK - AN EFFECTIVE TECHNIQUE FOR VLIW AND SUPERSCALAR COMPILATION [J].
HWU, WMW ;
MAHLKE, SA ;
CHEN, WY ;
CHANG, PHP ;
WARTER, NJ ;
BRINGMANN, RA ;
OUELLETTE, RG ;
HANK, RE ;
KIYOHARA, T ;
HAAB, GE ;
HOLM, JG ;
LAVERY, DM .
JOURNAL OF SUPERCOMPUTING, 1993, 7 (1-2) :229-248
[8]  
Nair R, 1997, ACM COMP AR, P13, DOI 10.1145/384286.264125
[9]  
OLUKOTUN K, 2005, ACM QUEUE, P27
[10]   High-performance and low-cost dual-thread VLIW processor using weld architecture paradigm [J].
Özer, E ;
Conte, TM .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2005, 16 (12) :1132-1142