Enabling Large Decoded Instruction Loop Caching for Energy-Aware Embedded Processors

被引:1
作者
Gu, Ji [1 ]
Guo, Hui [1 ]
机构
[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
来源
PROCEEDINGS OF THE 2010 INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURES AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES '10) | 2010年
关键词
Cache hierarchy; filter cache; instruction decode; low power; low energy; embedded systems; POWER; CONSUMPTION; REDUCTION; DESIGN; BTB;
D O I
10.1145/1878921.1878957
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Low energy consumption in embedded processors is increasingly important in step with the system complexity. The on-chip instruction cache (I-cache) is usually a most energy consuming component on the processor chip due to its large size and frequent access operations. To reduce such energy consumption, the existing loop cache approaches use a tiny decoded cache to filter the I-cache access and instruction decode activity for repeated loop iterations. However, such designs are effective to small and simple loops, and only suitable for DSP kernel-like applications. They are not effectual to many embedded applications where complex loops are common. In this paper, we propose a decoded loop instruction cache (DLIC) that is small, hence energy efficient, yet can capture most loops, including large, nested ones with branch executions, so that a significant amount of I-cache accesses and instruction decoding can be eradicated. Experiments on a set of embedded benchmarks show that our proposed DLIC scheme can reduce energy consumption by up to 87%. On average, 66% energy can be saved on instruction fetching and decoding, at a performance overhead of only 1.4%.
引用
收藏
页码:247 / 256
页数:10
相关论文
共 34 条
[1]   Effective hardware-based two-way loop cache for high performance low power processors [J].
Anderson, T ;
Agarwala, S .
2000 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS & PROCESSORS, PROCEEDINGS, 2000, :403-407
[2]  
[Anonymous], P INT S HIGH PERF CO
[3]   Instruction buffering to reduce power in processors for signal processing [J].
Bajwa, RS ;
Hiraki, M ;
Kojima, H ;
Gorny, DJ ;
Nitta, K ;
Shridhar, A ;
Seki, K ;
Sasaki, K .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 1997, 5 (04) :417-424
[4]   Architectural and compiler techniques for energy reduction in high-performance microprocessors [J].
Bellas, N ;
Hajj, IN ;
Polychronopoulos, CD ;
Stamoulis, G .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2000, 8 (03) :317-326
[5]   Using dynamic cache management techniques to reduce energy in general purpose processors [J].
Bellas, NE ;
Hajj, IN ;
Polychronopoulos, CD .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2000, 8 (06) :693-708
[6]  
BURGER DC, 1997, CSTR19971342 U WISC
[7]  
CATTHOOR F., 2004, P 2004 AS S PAC DES, P824
[8]  
Chang YJ, 2006, ASIA S PACIF DES AUT, P917
[9]   Design and analysis of low-power cache using two-level filter scheme [J].
Chang, YJ ;
Ruan, SJ ;
Lai, FP .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2003, 11 (04) :568-580
[10]  
Ching-Long Su, 1995, Proceedings. 1995 International Symposium on Low Power Design, P63