Sequence-to-sequence models for workload interference prediction on batch processing datacenters

被引:7
作者
Buchaca, David [1 ,2 ]
Marcual, Joan [2 ]
LLuis Berral, Josep [1 ,2 ]
Carrera, David [1 ,2 ]
机构
[1] BSC, C Jordi Girona 1-3, Barcelona 08034, Spain
[2] UPC, BarcelonaTECH, Barcelona, Spain
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2020年 / 110卷
基金
欧洲研究理事会;
关键词
Resource management; Sequence-to-sequence; Workload interference; Deep learning; Workload placement;
D O I
10.1016/j.future.2020.03.058
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Co-scheduling of jobs in data centers is a challenging scenario where jobs can compete for resources, leading to severe slowdowns or failed executions. Efficient job placement on environments where resources are shared requires awareness on how jobs interfere during execution, to go far beyond ineffective resource overbooking techniques. Current techniques, most of which already involve machine learning and job modeling, are based on workload behavior summarization over time, rather than focusing on effective job requirements at each instant of the execution. In this work, we propose a methodology for modeling co-scheduling of jobs on data centers, based on their behavior towards resources and execution time and using sequence-to-sequence models based on recurrent neural networks. The goal is to forecast co-executed jobs footprint on resources throughout their execution time, from the profile shown by the individual jobs, in order to enhance resource manager and scheduler placement decisions. The methods presented herein are validated by using High Performance Computing benchmarks based on different frameworks (such as Hadoop and Spark) and applications (CPU bound, IO bound, machine learning, SQL queries...). Experiments show that the model can correctly identify the resource usage trends from previously seen and even unseen co-scheduled jobs. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:155 / 166
页数:12
相关论文
共 38 条
[1]  
Andrews GE, 2011, PROCEEDINGS OF THE ASME TURBO EXPO 2011, VOL 2, PTS A AND B, P449
[2]  
[Anonymous], 2014, INTEL 64 R IA 32 ARC
[3]  
[Anonymous], 2014, 2014 IEEE INT C IC D
[4]  
[Anonymous], 2012, P 4 USENIX WORKSH HO
[5]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[6]   The case for energy-proportional computing [J].
Barroso, Luiz Andre ;
Hoelzle, Urs .
COMPUTER, 2007, 40 (12) :33-+
[7]   ALOJA-ML: A Framework for Automating Characterization and Knowledge Discovery in Hadoop Deployments [J].
Berral, Josep Ll. ;
Poggi, Nicolas ;
Carrera, David ;
Call, Aaron ;
Reinauer, Rob ;
Green, Daron .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :1701-1710
[8]  
Berral JosepLl., 2010, e-Energy'10. (Passau, P215, DOI DOI 10.1145/1791314.1791349
[9]  
Bischl B, 2016, J MACH LEARN RES, V17
[10]   Automatic Generation of Workload Profiles Using Unsupervised Learning Pipelines [J].
Buchaca Prats, David ;
Lluis Berral, Josep ;
Carrera, David .
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2018, 15 (01) :142-155