Inferring Workflows with Job Dependencies from Distributed Processing Systems Logs (Or, how to evaluate your systems with realistic workflows NOT pulled out of thin air)

被引:0
作者
Carrillo, Gladys E. [1 ]
Abad, Cristina L. [1 ]
机构
[1] Escuela Super Politecn Litoral, ESPOL, Campus Gustavo Galindo Km 30-5 Via Perimetral, Guayaquil, Ecuador
来源
2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI | 2017年
关键词
Distributed processing; clusters; data mining; Hadoop; workflows; workloads; FRAMEWORK;
D O I
10.1109/DASC-PICom-DataCom-CyberSciTec.2017.168
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of evaluating new improvements to distributed processing platforms like Spark and Hadoop. One approach commonly used when evaluating these systems is to use workloads published by companies with large data clusters, like Google and Facebook. These evaluations seek to demonstrate the benefits of improvements to critical framework components like the job scheduler, under realistic workloads. However, published workloads typically do not contain information on dependencies between the jobs. This is problematic, as ignoring dependencies could lead to significantly misestimating the speedup obtained from a particular improvement. In this position paper, we discuss why it is important to include job dependency information when evaluating distributed processing frameworks, and show that workflow mining techniques can be used to obtain dependencies from job traces that lack them. As a proof-of-concept, we show that the proposed methodology is able to find workflows in traces published by Google.
引用
收藏
页码:1025 / 1030
页数:6
相关论文
共 33 条
  • [21] Singh G., 2008, P 15 ACM MARD GRAS C, P1
  • [22] Hive - A Warehousing Solution Over a Map-Reduce Framework
    Thusoo, Ashish
    Sen Sarma, Joydeep
    Jain, Namit
    Shao, Zheng
    Chakka, Prasad
    Anthony, Suresh
    Liu, Hao
    Wyckoff, Pete
    Murthy, Raghotham
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02): : 1626 - 1629
  • [23] Workflow mining: Discovering process models from event logs
    van der Aalst, W
    Weijters, T
    Maruster, L
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (09) : 1128 - 1142
  • [24] Workflow mining: A survey of issues and approaches
    van der Aalst, WMP
    van Dongen, BF
    Herbst, J
    Maruster, L
    Schimm, G
    Weijters, AJMM
    [J]. DATA & KNOWLEDGE ENGINEERING, 2003, 47 (02) : 237 - 267
  • [25] van Dongen BF, 2005, LECT NOTES COMPUT SC, V3536, P444
  • [26] Weijters A.J.M.M., 2006, Technische Universiteit Eindhoven
  • [27] A novel approach for process mining based on event types
    Wen, Lijie
    Wang, Jianmin
    van der Aalst, Wil M. P.
    Huang, Biqing
    Sun, Jiaguang
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2009, 32 (02) : 163 - 190
  • [28] Wilkes J., 2011, Google research blog
  • [29] Wu B., 2010, P 16 ACM SIGKDD INT
  • [30] Yanpei Chen, 2011, Proceedings of the 2011 IEEE 19th International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS 2011), P390, DOI 10.1109/MASCOTS.2011.12