Exploring System and Machine Learning Performance Interactions when Tuning Distributed Data Stream Applications

被引:1
作者
Odysseos, Lambros [1 ]
Herodotou, Herodotos [1 ]
机构
[1] Cyprus Univ Technol, Limassol, Cyprus
来源
2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2022) | 2022年
关键词
stream processing; machine learning; system parameter tuning; hyper-parameter tuning;
D O I
10.1109/ICDEW55742.2022.00008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deploying machine learning (ML) applications over distributed stream processing engines (DSPEs) such as Apache Spark Streaming is a complex procedure that requires extensive tuning along two dimensions. First, DSPEs have a vast array of system configuration parameters (such as degree of parallelism, memory buffer sizes, etc.) that need to be optimized to achieve the desired levels of latency and/or throughput. Second, each ML model has its own set of hyper-parameters that need to be tuned as they significantly impact the overall prediction accuracy of the trained model. These two forms of tuning have been studied extensively in the literature but only in isolation from each other. This position paper identifies the necessity for a combined system and ML model tuning approach based on a thorough experimental study. In particular, experimental results have revealed unexpected and complex interactions between the choices of system configuration and hyper-parameters, and their impact on both application and model performance. These findings open up new research directions in the field of self-managing stream processing systems.
引用
收藏
页码:24 / 29
页数:6
相关论文
共 17 条
[1]   DATABASE MINING - A PERFORMANCE PERSPECTIVE [J].
AGRAWAL, R ;
IMIELINSKI, T ;
SWAMI, A .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1993, 5 (06) :914-925
[2]  
Bardenet R., 2013, INT C MACHINE LEARNI, P199
[3]   StreamDM: Advanced Data Mining in Spark Streaming [J].
Bifet, Albert ;
Maniu, Silviu ;
Qian, Jianfeng ;
Tian, Guangjian ;
He, Cheng ;
Fan, Wei .
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, :1608-1611
[4]  
Bifet A, 2010, JMLR WORKSH CONF PRO, V11, P44
[5]   Towards Automatic Parameter Tuning of Stream Processing Systems [J].
Bilal, Muhammad ;
Canini, Marco .
PROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17), 2017, :189-200
[6]  
Padierna LC, 2017, STUD COMPUT INTELL, V667, P787, DOI 10.1007/978-3-319-47054-2_53
[7]  
Domingos P., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P71, DOI 10.1145/347090.347107
[8]  
Feurer M, 2019, SPRING SER CHALLENGE, P3, DOI 10.1007/978-3-030-05318-5_1
[9]  
Feurer M, 2015, ADV NEUR IN, V28
[10]  
Founta Antigoni Maria, 2018, PROC 12 INT AAAI C W