L-Heron: An open-source load-aware online scheduler for Apache Heron

被引:2
作者
Zhang, Yitian [1 ]
Yu, Jiong [1 ,2 ]
Lu, Liang [3 ]
Li, Ziyang [2 ]
Meng, Zhao [1 ]
机构
[1] Xinjiang Univ, Sch Software, Urumqi 830008, Peoples R China
[2] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi 830046, Peoples R China
[3] Civil Aviat Univ China, Sch Comp Sci & Technol, Tianjin 300300, Peoples R China
基金
中国国家自然科学基金;
关键词
Big data; Stream processing; Apache Heron; Task scheduling; Load balancing; STREAM; MODEL;
D O I
10.1016/j.sysarc.2020.101727
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Apache Heron has emerged as a promising Data Stream Processing System (DSPS). However, it lacks intelligent scheduling strategy, which results in significant performance degradation for streaming applications in certain scenarios. In this paper, we first illustrate the inefficiencies and challenges of Heron default scheduling in current practice through experimental observations and analysis. Motivated by our observations, we propose L-Heron, an online scheduler based on Heron, which has the following features: (i) based on runtime information, it can improve the data processing efficiency by using the load-aware online scheduling, which heuristically minimizes the overall communication overhead by identifying the traffic load; (ii) it is load aware, which can effectively balance the workload of a topology to avoid heavy performance loss caused by overloading of worker nodes; (iii) it provides an online scheduling interface that is transparent to users, which allows users to focus on their scheduling logic and easily deploy them to the system. Additionally, we have evaluated L-Heron on well-known example topologies and a realistic application. Extensive experimental results show that the effectiveness of L-Heron is consistent among multiple metrics including the system completion latency, inter-node traffic, CPU utilization and throughput, with respect to Heron and recent related work.
引用
收藏
页数:17
相关论文
共 43 条
  • [1] Aurora: a new model and architecture for data stream management
    Abadi, DJ
    Carney, D
    Cetintemel, U
    Cherniack, M
    Convey, C
    Lee, S
    Stonebraker, M
    Tatbul, N
    Zdonik, S
    [J]. VLDB JOURNAL, 2003, 12 (02) : 120 - 139
  • [2] Amini Lisa., 2006, Proceedings of the 4th inter- national workshop on Data mining standards, services and platforms, DMSSP '06, P27
  • [3] Aniello L., 2013, P 7 ACM INT C DISTR, P207, DOI [10.1145/2488222, DOI 10.1145/2488222.2488267]
  • [4] Task scheduling techniques in cloud computing: A literature survey
    Arunarani, A. R.
    Manjula, D.
    Sugumaran, Vijayan
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 91 : 407 - 415
  • [5] Carbone Paris, 2015, IEEE Data Engineering Bulletin, DOI DOI 10.1109/IC2EW.2016.56
  • [6] Cardellini Valeria, 2017, ACM SIGMETRICS Performance Evaluation Review, V44, P11, DOI 10.1145/3092819.3092823
  • [7] Cardellini V., 2016, P 10 ACM INT C DISTR, P69, DOI [10.1145/2933267.2933312, DOI 10.1145/2933267.2933312]
  • [8] Chatzistergiou A., 2014, Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, P1579
  • [9] Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming
    Chintapalli, Sanket
    Dagit, Derek
    Evans, Bobby
    Farivar, Reza
    Graves, Thomas
    Holderbaugh, Mark
    Liu, Zhuo
    Nusbaum, Kyle
    Patil, Kishorkumar
    Peng, Boyang Jerry
    Poulosky, Paul
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1789 - 1792
  • [10] CHU WW, 1980, COMPUTER, V13, P57, DOI 10.1109/MC.1980.1653419