Rhythm: Harnessing Data Parallel Hardware for Server Workloads

被引:21
作者
Agrawal, Sandeep R. [1 ]
Pistol, Valentin [1 ]
Pang, Jun [1 ]
Tran, John [2 ]
Tarjan, David [2 ]
Lebeck, Alvin R. [1 ]
机构
[1] Duke Univ, Durham, NC 27706 USA
[2] NVIDIA, Santa Clara, CA USA
基金
美国国家科学基金会;
关键词
high throughput; power efficiency; execution similarity;
D O I
10.1145/2541940.2541956
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Trends in increasing web traffic demand an increase in server throughput while preserving energy efficiency and total cost of ownership. Present work in optimizing data center efficiency primarily focuses on the data center as a whole, using off-the-shelf hardware for individual servers. Server capacity is typically increased by adding more machines, which is cheap, though inefficient in the long run in terms of energy and area. Our work builds on the observation that server workload execution patterns are not completely unique across multiple requests. We present a framework-called Rhythm-for high throughput servers that can exploit similarity across requests to improve server performance and power/energy efficiency by launching data parallel executions for request cohorts. An implementation of the SPECWeb Banking workload using Rhythm on NVIDIA GPUs provides a basis for evaluating both software and hardware for future cohort-based servers. Our evaluation of Rhythm on future server platforms shows that it achieves 4x the throughput (reqs/sec) of a core i7 at efficiencies (reqs/Joule) comparable to a dual core ARM Cortex A9. A Rhythm implementation that generates transposed responses achieves 8x the i7 throughput while processing 2.5x more requests/Joule compared to the A9.
引用
收藏
页码:19 / 34
页数:16
相关论文
共 51 条
[1]  
Andersen DG, 2009, SOSP'09: PROCEEDINGS OF THE TWENTY-SECOND ACM SIGOPS SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P1
[2]  
[Anonymous], 2004, Linux J.
[3]  
[Anonymous], TESL K20 GPU ACC BOA
[4]  
[Anonymous], 2010, ACM SIGOPSOper. Syst. Rev., DOI DOI 10.1145/1842733.1842736
[5]  
Atta Islam., 2013, P 40 ANN INT S COMPU, P273, DOI [10.1145/2485922.2485946, DOI 10.1145/2485922.2485946]
[6]  
Bakkum P., 2010, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10 (New York, New York, USA, 2010), P94, DOI DOI 10.1145/1735688.1735706
[7]  
Biswas S, 2009, CONF PROC INT SYMP C, P164, DOI 10.1145/1555815.1555777
[8]   AMD FUSION APU: LLANO [J].
Branover, Alexander ;
Foley, Denis ;
Steinman, Maurice .
IEEE MICRO, 2012, 32 (02) :28-37
[9]  
Buck I, 2007, SIGGRAPH 07 ACM SIGG, P6
[10]  
Chalamalasetti SaiRahul., 2013, Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, P245, DOI DOI 10.1145/2435264.2435306