A distributed in-memory key-value store system on heterogeneous CPU-GPU cluster

被引:4
作者
Zhang, Kai [1 ]
Wang, Kaibo [2 ]
Yuan, Yuan [3 ]
Guo, Lei [2 ]
Li, Rubao [3 ]
Zhang, Xiaodong [3 ]
He, Bingsheng [4 ]
Hu, Jiayu [5 ]
Hua, Bei [5 ]
机构
[1] Fudan Univ, Shanghai, Peoples R China
[2] Google Inc, Mountain View, CA USA
[3] Ohio State Univ, Columbus, OH 43210 USA
[4] Natl Univ Singapore, Singapore, Singapore
[5] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
基金
美国国家科学基金会;
关键词
Key-value store; GPU; Heterogeneous systems; Distributed systems; Energy efficiency;
D O I
10.1007/s00778-017-0479-0
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In-memory key-value stores play a critical role in many data-intensive applications to provide high-throughput and low latency data accesses. In-memory key-value stores have several unique properties that include (1) data-intensive operations demanding high memory bandwidth for fast data accesses, (2) high data parallelism and simple computing operations demanding many slim parallel computing units, and (3) a large working set. However, our experiments show that homogeneous multicore CPU systems are increasingly mismatched to the special properties of key-value stores because they do not provide massive data parallelism and high memory bandwidth; the powerful but the limited number of computing cores does not satisfy the demand of the unique data processing task; and the cache hierarchy may not well benefit to the large working set. In this paper, we present the design and implementation of Mega-KV, a distributed in-memory key-value store system on a heterogeneous CPU-GPU cluster. Effectively utilizing the high memory bandwidth and latency hiding capability of GPUs, Mega-KV provides fast data accesses and significantly boosts overall performance and energy efficiency over the homogeneous CPU architectures. Mega-KV shows excellent scalability and processes up to 623-million key-value operations per second on a cluster installed with eight CPUs and eight GPUs, while delivering an efficiency of up to 299-thousand operations per Watt (KOPS/W).
引用
收藏
页码:729 / 750
页数:22
相关论文
共 54 条
  • [1] Andersen DG, 2009, SOSP'09: PROCEEDINGS OF THE TWENTY-SECOND ACM SIGOPS SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P1
  • [2] [Anonymous], 2013, SIGARCH Comput. Archit. News, DOI [DOI 10.1145/2508148.2485964, 10.1145/2508148.2485964, DOI 10.1145/2485922]
  • [3] [Anonymous], 2014, PROC USENIX NSDI
  • [4] [Anonymous], 2013, 10 USENIX S NETW SYS
  • [5] [Anonymous], 2010, ACM SIGOPS Operating Systems Review, DOI DOI 10.1145/1713254.1713276
  • [6] [Anonymous], 2010, P 1 ACM S CLOUD COMP, DOI DOI 10.1145/1807128.1807152
  • [7] Atikoglu Berk, 2012, Performance Evaluation Review, V40, P53, DOI 10.1145/2318857.2254766
  • [8] BEREZECKI M., 2011, Green Computing Conference and Workshops (IGCC), 2011 International, P1
  • [9] Chalamalasetti SaiRahul., 2013, Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, P245, DOI DOI 10.1145/2435264.2435306
  • [10] Cormen T.H., 2009, INTRO ALGORITHM