Mitigating Amdahl's Law through EPI throttling

被引:82
作者
Annavaram, M [1 ]
Grochowski, E [1 ]
Shen, J [1 ]
机构
[1] Intel Corp, Microarchitecture Res Lab, Santa Clara, CA 95054 USA
来源
32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS | 2005年
关键词
D O I
10.1109/ISCA.2005.36
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper is motivated by three recent trends in computer design. First, chip multi-processors (CMPs) with increasing numbers of CPU cores per chip are becoming common. Second, multi-threaded software that can take advantage of CMPs will soon become prevalent. Due to the nature of the algorithms, these multi-threaded programs inherently will have phases of sequential execution; Amdahl's law dictates that the speedup of such parallel programs will be limited by the sequential portion of the computation. Finally, increasing levels of on-chip integration coupled with a slowing rate of reduction in supply voltage make power consumption a first order design constraint. Given this environment, our goal is to minimize the execution times of multi-threaded programs containing nontrivial parallel and sequential phases, while keeping the CMP's total power consumption within a fixed budget. In order to mitigate the effects of Amdahl's law, in this paper we make a compelling case for varying the amount Of energy expended to process instructions according to the amount of available parallelism. Using the equation, Power=Energy per instruction (EPI) * Instructions per second (IPS), we propose that during phases of limited parallelism (low IPS) the chip multi-processor will spend more EPI; similarly, during phases of higher parallelism (high IPS) the chip multi-processor will spend less EPI; in both scenarios power is fixed We evaluate the performance benefits of an EPI throttle on an asymmetric multiprocessor (AMP) prototyped from a physical 4-way Xeon SW server. Using a wide range of multi-threaded programs, we show a 38% wall clock speedup on an AMP compared to a standard SMP that uses the same power. We also measure the supply current on a 4-way SMP server while running the multi-threaded programs and use the measured data as input to a software simulator that implements a more flexible EPI throttle. The results from the measurement-driven simulation show performance benefits comparable to the AW prototype. We analyze the results from both techniques, explain why and when an EPI throttle works well, and conclude with a discussion of the challenges in building practical EPI throttles.
引用
收藏
页码:298 / 309
页数:12
相关论文
共 20 条
  • [1] Dynamically tuning processor resources with adaptive processing
    Albonesi, DH
    Balasubramonian, R
    Dropsho, SG
    Dwarkadas, S
    Friedman, EG
    Huang, MC
    Kursun, V
    Magklis, G
    Scott, ML
    Semeraro, G
    [J]. COMPUTER, 2003, 36 (12) : 49 - +
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] Aslot V, 2001, LECT NOTES COMPUT SC, V2104, P1
  • [4] Barroso LA, 2000, PROCEEDING OF THE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, P282, DOI [10.1109/ISCA.2000.854398, 10.1145/342001.339696]
  • [5] Dynamic thermal management for high-performance microprocessors
    Brooks, D
    Martonosi, M
    [J]. HPCA: SEVENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTING ARCHITECTURE, PROCEEDINGS, 2001, : 171 - 182
  • [6] Burd T. D., 1995, Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences, P288, DOI 10.1109/HICSS.1995.375385
  • [7] FIGUEIREDO RJO, 2000, P 6 INT S HIGH PERF, P26, DOI DOI 10.1109/HPCA.2000.824336
  • [8] Best of both latency and throughput
    Grochowski, E
    Ronen, R
    Shen, J
    Wang, H
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS & PROCESSORS, PROCEEDINGS, 2004, : 236 - 243
  • [9] GUTHER SH, 2001, INTEL TECHNOLOGY J
  • [10] HAMMOND L, 1999, HOT CHIPS