Performance and Energy-Efficient Design of STT-RAM Last-Level Cache

被引：19

作者：

Hameed, Fazal ^{[1
,2
]}

Khan, Asif Ali ^{[1
]}

Castrillon, Jeronimo ^{[1
]}

机构：

[1] Tech Univ Dresden, Chair Compiler Consruct, D-01069 Dresden, Germany

[2] Inst Space Technol, Islamabad 44000, Pakistan

来源：

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS | 2018年 / 26卷 / 06期

关键词：

Architecture; cache; embedded systems; memory; memory hierarchy; CHIP DRAM CACHE;

D O I：

10.1109/TVLSI.2018.2804938

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent research has proposed having a die-stacked last-level cache (LLC) to overcome the memory wall. Lately, spin-transfer-torque random access memory (STT-RAM) caches have received attention, since they provide improved energy efficiency compared with DRAM caches. However, recently proposed STT-RAM cache architectures unnecessarily dissipate energy by fetching unneeded cache lines (CLs) into the row buffer (RB). In this paper, we propose a selective read policy for the STT-RAM which fetches those CLs into the RB that are likely to be reused. In addition, we propose a tags-update policy that reduces the number of STT-RAM writebacks. This reduces the number of reads/writes and thereby decreases the energy consumption. To reduce the latency penalty of our selective read policy, we propose the following performance optimizations: 1) an RB tags-bypass policy that reduces STT-RAM access latency; 2) an LLC data cache that stores the CLs that are likely to be used in the near future; 3) an address organization scheme that simultaneously reduces LLC access latency and miss rate; and 4) a tags-to-column mapping policy that improves access parallelism. For evaluation, we implement our proposed architecture in the Zesto simulator and run different combinations of SPEC2006 benchmarks on an eight-core system. We compare our approach with a recently proposed STT-RAM LLC with subarray parallelism support and show that our synergistic policies reduce the average LLC dynamic energy consumption by 75% and improve the system performance by 6.5%. Compared with the state-of-the-art DRAM LLC with subarray parallelism, our architecture reduces the LLC dynamic energy consumption by 82% and improves system performance by 6.8%.

引用

页码：1059 / 1072

页数：14

共 42 条

[1] [Anonymous], 2017, STANDARD PERFORMANCE
[2] [Anonymous], 2013, HYBRID MEMORY CUBE C
[3] [Anonymous], 2010, P 16 INT S HIGH PERF
[4] IBM POWER7 systems
Arroyo, R. X.
Harrington, R. J.
Hartman, S. P.
Nguyen, T.
[J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2011, 55 (03)
[5] Bishnoi R, 2015, INT SYM QUAL ELECT, P548
[6] NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory
Dong, Xiangyu
Xu, Cong
Xie, Yuan
Jouppi, Norman P.
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2012, 31 (07) : 994 - 1007
[7] THE ORACLE SPARC T5 16-CORE PROCESSOR SCALES TO EIGHT SOCKETS
Feehrer, John
Jairath, Sumti
Loewenstein, Paul
Sivaramakrishnan, Ram
Smentek, David
Turullols, Sebastian
Vahidsafa, Ali
[J]. IEEE MICRO, 2013, 33 (02) : 48 - 57
[8] Gove D., 2007, Computer Architecture News, V35, P90, DOI 10.1145/1241601.1241619
[9] Hameed F., 2014, PROC DESIGN AUTOM C, P1
[10] Hameed F., 2013, PROC IEEE INT C HARD, P1

← 1 2 3 4 5 →