BATMAN: Techniques for Maximizing System Bandwidth of Memory Systems with Stacked-DRAM

被引：28

作者：

Chou, Chiachen ^{[1
]}

Jaleel, Aamer ^{[2
]}

Qureshi, Moinuddin ^{[1
]}

机构：

[1] Georgia Inst Technol, Sch ECE, Atlanta, GA 30332 USA

[2] NVIDIA, NVIDIA Res, Santa Clara, CA USA

来源：

MEMSYS 2017: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS | 2017年

基金：

美国国家科学基金会;

关键词：

POLICIES;

D O I：

10.1145/3132402.3132404

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Tiered-memory systems consist of high-bandwidth 3D-DRAM and high-capacity commodity-DRAM. Conventional designs attempt to improve system performance by maximizing the number of memory accesses serviced by 3D-DRAM. However, when the commodity-DRAM bandwidth is a significant fraction of overall system bandwidth, the techniques ineficiently utilize the total bandwidth offered by the tiered-memory system and yields suboptimal performance. In such situations, the performance can be improved by distributing memory accesses that are proportional to the bandwidth of each memory. Ideally, we want a simple and effective runtime mechanism that achieves the desired access distribution without requiring significant hardware or software support. This paper proposes Bandwidth-Aware Tiered-Memory Management (BATMAN), a runtime mechanism that manages the distribution of memory accesses in a tiered-memory system by explicitly controlling data movement. BATMAN monitors the number of accesses to both memories, and when the number of 3D-DRAM accesses exceeds the desired threshold, BATMAN disallows data movement from the commodity-DRAM to 3D-DRAM and proactively moves data from 3D-DRAM to commodity-DRAM. We demonstrate BATMAN on systems that architect the 3D-DRAM as either a hardware-managed cache (cache mode) or a part of the OS-visible memory space (flat mode). Our evaluations on a system with 4GB 3D-DRAM and 32GB commodity-DRAM show that BATMAN improves performance by an average of 11% and 10% and energy-delay product by 13% and 11% for systems in the cache and flat modes, respectively. BATMAN incurs only an eight-byte hardware overhead and requires negligible software modification.

引用

页码：268 / 280

页数：13

共 47 条

[31] Pin: Building customized program analysis tools with dynamic instrumentation [J].

Luk, CK ;

Cohn, R ;

Muth, R ;

Patil, H ;

Klauser, A ;

Lowney, G ;

Wallace, S ;

Reddi, VJ ;

Hazelwood, K .

ACM SIGPLAN NOTICES, 2005, 40 (06) :190-200

[32]

McCalpin J. D., 1991, STREAM SUSTAINABLE M

[33]

Meswani M.R., 2015, HIGH PERF COMP ARCH

[34]

Micron, 2014, HMC GEN2

[35]

Micron, 2012, CALC DDR MEM SYST PO

[36]

Micron, 2010, 1GB DDR3 SDRAM

[37]

NVIDIA, 2014, NVIDIA PASC

[38]

Perelman E., 2003, Performance Evaluation Review, V31, P318, DOI 10.1145/885651.781076

[39]

Qreshi Moinuddin K., 2012, P 2012 45 ANN INT S, P12, DOI [10.1109/MICRO.2012.30, DOI 10.1109/MICR0.2012.30]

[40] A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications [J].

Sim, Jaewoong ;

Dasgupta, Aniruddha ;

Kim, Hyesoon ;

Vuduc, Richard .

ACM SIGPLAN NOTICES, 2012, 47 (08) :11-21

← 1 2 3 4 5 →