A New Benchmark Harness for Systematic and Robust Evaluation of Streaming State Stores

被引:9
作者
Asyabi, Esmail [1 ]
Wang, Yuanli [1 ]
Liagouris, John [1 ]
Kalavri, Vasiliki [1 ]
Bestavros, Azer [1 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
来源
PROCEEDINGS OF THE SEVENTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '22) | 2022年
关键词
stream processing; KV store; benchmark; MANAGEMENT;
D O I
10.1145/3492321.3519592
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern stream processing systems often rely on embedded key-value stores, like RocksDB, to manage the state of long-running computations. Evaluating the performance of these stores when used for streaming workloads is cumbersome as it requires the configuration and deployment of a stream processing system that integrates the respective store, and the execution of representative queries to collect measurements. To address this issue, in this paper, we start with an empirical characterization of streaming state access workloads collected from Apache Flink and RocksDB, using three publicly available datasets, and we show that the characteristics of real traces cannot be approximated with existing benchmarks. Next, we present Gadget, a new benchmark harness that generates realistic streaming state access workloads to enable easy and thorough performance evaluation of standalone KV stores through accurate simulation of streaming operator logic. Finally, we use Gadget to investigate the suitability of RocksDB as the de facto kv store for stream processing systems. Interestingly, we find that, although RocksDB provides robust results, it is outperformed by FASTER and BerkeleyDB in six out of eleven workloads. Our results reveal a wide performance gap between the current performance of streaming state stores and what could be achieved with workload-aware approaches.
引用
收藏
页码:559 / 574
页数:16
相关论文
共 56 条
[1]  
Akidau T, 2015, PROC VLDB ENDOW, V8, P1792
[2]   MillWheel: Fault-Tolerant Stream Processing at Internet Scale [J].
Akidau, Tyler ;
Balikov, Alex ;
Bekiroglu, Kaya ;
Chernyak, Slava ;
Haberman, Josh ;
Lax, Reuven ;
McVeety, Sam ;
Mills, Daniel ;
Nordstrom, Paul ;
Whittle, Sam .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11) :1033-1044
[3]  
alibabacloud, ALIBABA REALTIME COM
[4]  
Almeida V, 1996, PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED INFORMATION SYSTEMS, P92, DOI 10.1109/PDIS.1996.568672
[5]  
Amazon Kinesis, KINESIS
[6]  
apache, ROCKSDB STATE BACKEN
[7]  
Apache Flink, About Us
[8]  
Arasu Arvind, 2004, Proceedings of the Thirtieth International Conference on Very Large Data Bases, V30, P480
[9]   Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark [J].
Armbrust, Michael ;
Das, Tathagata ;
Torres, Joseph ;
Yavuz, Burak ;
Zhu, Shixiong ;
Xin, Reynold ;
Ghodsi, Ali ;
Stoica, Ion ;
Zaharia, Matei .
SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, :601-613
[10]  
Armstrong Timothy G., 2013, ACM SIGMOD, P1185, DOI [DOI 10.1145/2463676.2465296, 10.1145/2463676.2465296]