L-Heron: An open-source load-aware online scheduler for Apache Heron

被引:2
作者
Zhang, Yitian [1 ]
Yu, Jiong [1 ,2 ]
Lu, Liang [3 ]
Li, Ziyang [2 ]
Meng, Zhao [1 ]
机构
[1] Xinjiang Univ, Sch Software, Urumqi 830008, Peoples R China
[2] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi 830046, Peoples R China
[3] Civil Aviat Univ China, Sch Comp Sci & Technol, Tianjin 300300, Peoples R China
基金
中国国家自然科学基金;
关键词
Big data; Stream processing; Apache Heron; Task scheduling; Load balancing; STREAM; MODEL;
D O I
10.1016/j.sysarc.2020.101727
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Apache Heron has emerged as a promising Data Stream Processing System (DSPS). However, it lacks intelligent scheduling strategy, which results in significant performance degradation for streaming applications in certain scenarios. In this paper, we first illustrate the inefficiencies and challenges of Heron default scheduling in current practice through experimental observations and analysis. Motivated by our observations, we propose L-Heron, an online scheduler based on Heron, which has the following features: (i) based on runtime information, it can improve the data processing efficiency by using the load-aware online scheduling, which heuristically minimizes the overall communication overhead by identifying the traffic load; (ii) it is load aware, which can effectively balance the workload of a topology to avoid heavy performance loss caused by overloading of worker nodes; (iii) it provides an online scheduling interface that is transparent to users, which allows users to focus on their scheduling logic and easily deploy them to the system. Additionally, we have evaluated L-Heron on well-known example topologies and a realistic application. Extensive experimental results show that the effectiveness of L-Heron is consistent among multiple metrics including the system completion latency, inter-node traffic, CPU utilization and throughput, with respect to Heron and recent related work.
引用
收藏
页数:17
相关论文
共 43 条
  • [11] Eidenbenz R., 2016, Proc. IEEE INFOCOM, P1
  • [12] T3-Scheduler: A topology and Traffic aware two-level Scheduler for stream processing systems in a heterogeneous cluster
    Eskandari, Leila
    Mair, Jason
    Huang, Zhiyi
    Eyers, David
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 89 : 617 - 632
  • [13] Fischer L, 2015, PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, P124, DOI 10.1109/BigData.2015.7363749
  • [14] Twitter Heron: Towards Extensible Streaming Engines
    Fu, Maosong
    Agrawal, Ashvin
    Floratou, Avrilia
    Graham, Bill
    Jorgensen, Andrew
    Li, Runhang
    Lu, Neng
    Ramasamy, Karthik
    Rao, Sriram
    Wang, Cong
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1165 - 1172
  • [15] DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams
    Fu, Tom Z. J.
    Ding, Jianbing
    Ma, Richard T. B.
    Winslett, Marianne
    Yang, Yin
    Zhang, Zhenjie
    [J]. 2015 IEEE 35TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 2015, : 411 - 420
  • [16] Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges
    Garcia-Valls, Marisol
    Dubey, Abhishek
    Botti, Vicent
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2018, 91 : 83 - 102
  • [17] Garey M. R., 1979, Computers and intractability. A guide to the theory of NP-completeness
  • [18] Self-adaptive processing graph with operator fission for elastic stream processing
    Hidalgo, Nicolas
    Wladdimiro, Daniel
    Rosas, Erika
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 127 : 205 - 216
  • [19] Hindman B., 2011, PROC USENIX C NETWOR, P22
  • [20] Trends in big data analytics
    Kambatla, Karthik
    Kollias, Giorgos
    Kumar, Vipin
    Grama, Ananth
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (07) : 2561 - 2573