On big data benchmarking

被引:13
作者
Han, Rui [1 ]
Xiaoyi, Lu [2 ]
jiangtao, Xu [3 ]
机构
[1] Department of Computing, Imperial College London, London
[2] Ohio State University, Columbus
[3] Beijing Jiaotong University, Beijing
来源
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | 2014年 / 8807卷
关键词
Benchmark; Big data systems; Data; Tests;
D O I
10.1007/978-3-319-13021-7_1
中图分类号
学科分类号
摘要
Big data systems address the challenges of capturing, storing, managing, analyzing, and visualizing big data. Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research and industry communities. To date, most of the state-of-the-art big data benchmarks are designed for specific types of systems. Based on our experience, however, we argue that considering the complexity, diversity, and rapid evolution of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads. Given this motivation, in this paper, we first propose the key requirements and challenges in developing big data benchmarks from the perspectives of generating data with 4V properties (i.e. volume, velocity, variety and veracity) of big data, as well as generating tests with comprehensive workloads for big data systems. We then present the methodology on big data benchmarking designed to address these challenges. Next, the state-of-the-art are summarized and compared, following by our vision for future research directions. © Springer International Publishing Switzerland 2014.
引用
收藏
页码:3 / 18
页数:15
相关论文
共 20 条
[1]  
Big Data Benchmark by Amplab of Uc Berkeley, (2013)
[2]  
Gridmix, (2013)
[3]  
Ibm Big Data Platform, (2013)
[4]  
(2013)
[5]  
Sort Benchmark, (2013)
[6]  
(2013)
[7]  
Tpc Transaction Processing Performance Council, (2013)
[8]  
Armstrong T.G., Ponnekanti V., Borthakur D., Callaghan M., Linkbench: A database benchmark based on the facebook social graph, Proceedings of the 2013 International Conference on Management of Data, pp. 1185-1196, (2013)
[9]  
Blei D.M., Ng A.Y., Jordan M.I., Latent dirichlet allocation, J. Mach. Learn. Res, 3, pp. 993-1022, (2003)
[10]  
Cooper B.F., Silberstein A., Tam E., Ramakrishnan R., Sears R., Benchmarking cloud serving systems with YCSB, Proceedings of the 1St ACM Symposium on Cloud Computing, pp. 143-154, (2010)