CSMT: Simultaneous Multithreading for Clustered VLIW Processors

被引:3
作者
Gupta, Manoj [1 ]
Sanchez, Fermin [1 ]
Llosa, Josep [1 ]
机构
[1] Univ Politecn Cataluna, Dept Arquitectura Computadors, ES-08034 Barcelona, Spain
关键词
ILP; VLIW architectures; clustered VLIW architectures; multithreaded processors; simultaneous multithreading; ARCHITECTURE; PERFORMANCE;
D O I
10.1109/TC.2009.96
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Simultaneous MultiThreading (SMT) is a well-known technique that improves resource utilization by exploiting thread-level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures, which is contrary to the VLIW philosophy of hardware simplicity. In this paper, we propose Cluster-level Simultaneous MultiThreading (CSMT) to allow some degree of SMT in clustered VLIW processors with low hardware cost and complexity. CSMT considers the set of operations that execute simultaneously in a given cluster as the assignment unit. To minimize cluster conflicts between threads, a very simple hardware-based cluster renaming mechanism is proposed. The hardware required to implement CSMT is cheap, realistic, and practical for a clustered VLIW processor. An analysis of the hardware required to implement CSMT shows that it is quite scalable, with up to eight threads easily supported at low hardware cost. The experimental results show that CSMT significantly improves performance when compared with other multithreading approaches suited for VLIW. For instance, with four threads, CSMT shows an average speedup of 110 percent over a single-thread VLIW architecture and 40 percent over Interleaved MultiThreading (IMT). In some cases, speedup can be as high as 225 percent over single-thread architecture and 84 percent over IMT.
引用
收藏
页码:385 / 399
页数:15
相关论文
共 39 条
  • [1] [Anonymous], X264 FREE H264 AVC E
  • [2] BARRETTA D, 2005, P DES AUT TEST EUR D
  • [3] A multithreaded PowerPC processor for commercial servers
    Borkenhagen, JM
    Eickemeyer, RJ
    Kalla, RN
    Kunkel, SR
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2000, 44 (06) : 885 - 898
  • [4] COLWELL RP, 1987, P ARCH SUPP PROGR LA
  • [5] EICKEMEYER RJ, 1997, P INT S HIGH PERF CO
  • [6] Ellis J.R., 1986, Bulldog: A Compiler for VLSI Architectures
  • [7] FARABOSCHI P, 2000, P INT S COMP ARCH IS
  • [8] FARRENS MK, 1991, P ISCA 18, P362
  • [9] *FFMPEG, 2009, INV DISCR COS TRANSF
  • [10] Fillo M., 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture (Cat. No.95TB100012), P146, DOI 10.1109/MICRO.1995.476822