Integrated 3D-Stacked Server Designs for Increasing Physical Density of Key-Value Stores

被引:14
作者
Gutierrez, Anthony [1 ]
Cieslak, Michael [1 ]
Giridhar, Bharan [1 ]
Dreslinski, Ronald G. [1 ]
Ceze, Luis [2 ]
Mudge, Trevor [1 ]
机构
[1] Univ Michigan, Adv Comp Architecture Lab, Ann Arbor, MI 48109 USA
[2] Univ Washington, Comp Sci & Engn Dept, Seattle, WA 98195 USA
关键词
Design; Performance; 3D Integration; Data Centers; Key-Value Stores; Physical Density; Scale-Out Systems; ENERGY;
D O I
10.1145/2541940.2541951
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Key-value stores, such as Memcached, have been used to scale web services since the beginning of the Web 2.0 era. Data center real estate is expensive, and several industry experts we have spoken to have suggested that a significant portion of their data center space is devoted to key-value stores. Despite its wide-spread use, there is little in the way of hardware specialization for increasing the efficiency and density of Memcached; it is currently deployed on commodity servers that contain high-end CPUs designed to extract as much instruction-level parallelism as possible. Out-of-order CPUs, however have been shown to be inefficient when running Memcached. To address Memcached efficiency issues, we propose two architectures using 3D stacking to increase data storage efficiency. Our first 3D architecture, Mercury, consists of stacks of ARM Cortex-A7 cores with 4GB of DRAM, as well as NICs. Our second architecture, Iridium, replaces DRAM with NAND Flash to improve density. We explore, through simulation, the potential efficiency benefits of running Memcached on servers that use 3D-stacking to closely integrate low-power CPUs with NICs and memory. With Mercury we demonstrate that density may be improved by 2.9x, power efficiency by 4.9x, throughput by 10x, and throughput per GB by 3.5 x over a state-of-the-art server running optimized Memcached. With Iridium we show that density may be increased by 14 x, power efficiency by 2.4 x, and throughput by 5.2 x, while still meeting latency requirements for a majority of requests.
引用
收藏
页码:485 / 498
页数:14
相关论文
共 40 条
[1]  
Andersen DG, 2009, SOSP'09: PROCEEDINGS OF THE TWENTY-SECOND ACM SIGOPS SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P1
[2]  
[Anonymous], 2013, SX1016 64 PORT 10GBE
[3]  
[Anonymous], 2012, OCT 8 PORT DRAM DIE
[4]  
[Anonymous], 2013, ISCA
[5]  
[Anonymous], 2010, HP COMM SLOT PLAT PO
[6]  
Atikoglu Berk, 2012, Performance Evaluation Review, V40, P53, DOI 10.1145/2318857.2254766
[7]  
Barreh J., 2006, P 18 HOT CHIPS S
[8]  
BEREZECKI M., 2011, Green Computing Conference and Workshops (IGCC), 2011 International, P1
[9]  
Binkert Nathan, 2011, Computer Architecture News, V39, P1, DOI 10.1145/2024716.2024718
[10]  
Caulfield A. M., 2009, P 42 ANN IEEEACM INT, P24