Energy consumption analysis of data stream processing: a benchmarking approach

被引:4
作者
Dayarathna, Miyuru [1 ]
Li, Yuanlong [1 ]
Wen, Yonggang [1 ]
Fan, Rui [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
基金
欧盟地平线“2020”;
关键词
data stream processing; energy consumption analysis; benchmarking; workload characterization; distributed systems; linear road; DATA CENTERS; PERFORMANCE; MANAGEMENT;
D O I
10.1002/spe.2458
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Energy efficiency of data analysis systems has become a very important issue in recent times because of the increasing costs of data center operations. Although distributed streaming workloads have increasingly been present in modern data centers, energy-efficient scheduling of such applications remains as a significant challenge. In this paper, we conduct an energy consumption analysis of data stream processing systems in order to identify their energy consumption patterns. We follow stream system benchmarking approach to solve this issue. Specifically, we implement Linear Road benchmark on six stream processing environments (S4, Storm, ActiveMQ, Esper, Kafka, and Spark Streaming) and characterize these systems' performance on a real-world data center. We study the energy consumption characteristics of each system with varying number of roads as well as with different types of component layouts. We also use a microbenchmark to capture raw energy consumption characteristics. We observed that S4, Esper, and Spark Streaming environments had highest average energy consumption efficiencies compared with the other systems. Using a neural networkbased technique with the power/performance information gathered from our experiments, we developed a model for the power consumption behavior of a streaming environment. We observed that energy-efficient execution of streaming application cannot be specifically attributed to the system CPU usage. We observed that communication between compute nodes with moderate tuple sizes and scheduling plans with balanced system overhead produces better power consumption behaviors in the context of data stream processing systems. Copyright (c) 2016 John Wiley & Sons, Ltd.
引用
收藏
页码:1443 / 1462
页数:20
相关论文
共 46 条
  • [1] Aurora: a new model and architecture for data stream management
    Abadi, DJ
    Carney, D
    Cetintemel, U
    Cherniack, M
    Convey, C
    Lee, S
    Stonebraker, M
    Tatbul, N
    Zdonik, S
    [J]. VLDB JOURNAL, 2003, 12 (02) : 120 - 139
  • [2] Akram Shoaib., 2012, Proceedings of the International Conference on Distributed Event-Based Systems, P290
  • [3] [Anonymous], P 20 EUROMPI US GROU
  • [4] [Anonymous], 2011, P NETDB, DOI DOI 10.1007/BF00640482
  • [5] Arasu A., 2004, P 30 INT C VER LARG, P480
  • [6] ASF, 2014, STROM DISTR FAULT TO
  • [7] Bedini Ivan, 2013, P 4 ACMSPEC INT C PE, P173
  • [8] A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems
    Beloglazov, Anton
    Buyya, Rajkumar
    Lee, Young Choon
    Zomaya, Albert
    [J]. ADVANCES IN COMPUTERS, VOL 82, 2011, 82 : 47 - 111
  • [9] Bhartia R., 2014, Amazon kinesis and apache storm
  • [10] Botan I., 2007, VLDB, P75