Get Real: How Benchmarks Fail to Represent the Real World

被引:33
作者
Vogelsgesang, Adrian [1 ]
Haubenschild, Michael [1 ]
Finis, Jan [1 ]
Kemper, Alfons [1 ]
Leis, Viktor [1 ]
Muehlbauer, Tobias [1 ]
Neumann, Thomas [1 ]
Then, Manuel [1 ]
机构
[1] Tableau Software, Seattle, WA 98103 USA
来源
DBTEST'18: PROCEEDINGS OF THE WORKSHOP ON TESTING DATABASE SYSTEMS | 2018年
关键词
D O I
10.1145/3209950.3209952
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Industrial as well as academic analytics systems are usually evaluated based on well-known standard benchmarks, such as TPC-H or TPC-DS. These benchmarks test various components of the DBMS including the join optimizer, the implementation of the join and aggregation operators, concurrency control and the scheduler. However, these benchmarks fall short of evaluating the "real" challenges imposed by modern BI systems, such as Tableau, that emit machine-generated query workloads. This paper reports a comprehensive study based on a set of more than 60k real-world BI data repositories together with their generated query workload. The machine-generated workload posed by BI tools differs from the "hand-crafted" benchmark queries in multiple ways: Structurally simple relational operator trees often come with extremely complex scalar expressions such that expression evaluation becomes the limiting factor. At the same time, we also encountered much more complex relational operator trees than covered by benchmarks. This long tail in both, operator tree and expression complexity, is not adequately represented in standard benchmarks. We contribute various statistics gathered from the large dataset, e.g., data type distributions, operator frequency, string length distribution and expression complexity. We hope our study gives an impetus to database researchers and benchmark designers alike to address the relevant problems in future projects and to enable better database support for data exploration systems which become more and more important in the Big Data era.
引用
收藏
页数:6
相关论文
共 11 条
[1]  
Boissier Martin, 2016, CIKM
[2]  
Boncz P, 2014, LECT NOTES COMPUT SC, V8391, P61, DOI 10.1007/978-3-319-04936-6_5
[3]  
Cole Richard L, 2011, DBTEST
[4]  
Dieu Nicolas., 2009, PVLDB
[5]   OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases [J].
Difallah, Djellel Eddine ;
Pavlo, Andrew ;
Curino, Carlo ;
Cudre-Mauroux, Philippe .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 7 (04) :277-288
[6]  
Huppler K, 2009, LECT NOTES COMPUT SC, V5895, P18, DOI 10.1007/978-3-642-10424-4_3
[7]   LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms [J].
Iosup, Alexandru ;
Hegeman, Tim ;
Ngai, Wing Lung ;
Heldens, Stijn ;
Prat-Perez, Arnau ;
Manhardt, Thomas ;
Chafi, Hassan ;
Capota, Mihai ;
Sundaram, Narayanan ;
Anderson, Michael ;
Tanase, Ilie Gabriel ;
Xia, Yinglong ;
Nai, Lifeng ;
Boncz, Peter .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (13) :1317-1328
[8]   SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment [J].
Jain, Shrainik ;
Moritz, Dominik ;
Halperin, Daniel ;
Howe, Bill ;
Lazowska, Ed .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :281-293
[9]   Adaptive Execution of Compiled Queries [J].
Kohn, Andre ;
Leis, Viktor ;
Neumann, Thomas .
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, :197-208
[10]  
Leis V, 2015, PROC VLDB ENDOW, V9, P204