HybridTune: Spatio-Temporal Performance Data Correlation for Performance Diagnosis of Big Data Systems

被引:0
作者
Rui Ren
Jiechao Cheng
Xi-Wen He
Lei Wang
Jian-Feng Zhan
Wan-Ling Gao
Chun-Jie Luo
机构
[1] Chinese Academy of Sciences,Institute of Computing Technology
[2] University of Chinese Academy of Sciences,School of Computing
[3] National University of Singapore,undefined
来源
Journal of Computer Science and Technology | 2019年 / 34卷
关键词
Big Data system; spatio-temporal correlation; rule-based diagnosis; machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the Big Data systems. Currently, there are two major solutions. One is the pure data-driven diagnosis approach, which may be very time-consuming; the other is the rule-based analysis method, which usually requires prior knowledge. For Big Data applications like Spark workloads, we observe that the tasks in the same stages normally execute the same or similar codes on each data partition. On basis of the stage similarity and distributed characteristics of Big Data systems, we analyze the behaviors of the Big Data applications in terms of both system and micro-architectural metrics of each stage. Furthermore, for different performance problems, we propose a hybrid approach that combines prior rules and machine learning algorithms to detect performance anomalies, such as straggler tasks, task assignment imbalance, data skew, abnormal nodes and outlier metrics. Following this methodology, we design and implement a lightweight, extensible tool, named HybridTune, and measure the overhead and anomaly detection effectiveness of HybridTune using the BigDataBench benchmarks. Our experiments show that the overhead of HybridTune is only 5%, and the accuracy of outlier detection algorithm reaches up to 93%. Finally, we report several use cases diagnosing Spark and Hadoop workloads using BigDataBench, which demonstrates the potential use of HybridTune.
引用
收藏
页码:1167 / 1184
页数:17
相关论文
共 50 条
  • [11] Application of Mixtures of Gaussians for Tracking Clusters in Spatio-temporal Data
    Ertl, Benjamin
    Meyer, Joerg
    Streit, Achim
    Schneider, Matthias
    KDIR: PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL 1: KDIR, 2019, : 45 - 54
  • [12] Feature Selection on Spatio-Temporal Data for Solar Irradiance Forecasting
    Carranza-Garcia, Manuel
    Lara-Benitez, Pedro
    Maria Luna-Romera, Jose
    Riquelme, Jose C.
    16TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2021), 2022, 1401 : 654 - 664
  • [13] A learning approach for query planning on spatio-temporal IoT data
    Hoan Nguyen Mau Quoc
    Serrano, Martin
    Breslin, John G.
    Danh Le Phuoc
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON THE INTERNET OF THINGS (IOT'18), 2018,
  • [14] A Tool for Spatio-Temporal Analysis of Social Anxiety with Twitter Data
    Lee, Joohong
    Sohn, Dongyoung
    Choi, Yong Suk
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 2120 - 2123
  • [15] Spatio-Temporal Forecasting: A Survey of Data-Driven Models Using Exogenous Data
    Berkani, Safaa
    Guermah, Bassma
    Zakroum, Mehdi
    Ghogho, Mounir
    IEEE ACCESS, 2023, 11 : 75191 - 75214
  • [16] Predictive spatio-temporal models for spatially sparse environmental data
    de Luna, X
    Genton, MG
    STATISTICA SINICA, 2005, 15 (02) : 547 - 568
  • [17] Reliable information system for identifying spatio-temporal continuity of kinetic deformed objects with big point cloud data
    Chen, Claire Y. T.
    Sun, Edward W.
    Lin, Yi-Bing
    ANNALS OF OPERATIONS RESEARCH, 2023, 349 (1) : 103 - 138
  • [18] Spatio-temporal data generation based on separated attention for ENSO prediction
    Lin, Lianlei
    Wang, Junkai
    Tan, Aidi
    Chen, Jiawei
    APPLIED INTELLIGENCE, 2024, 54 (21) : 10473 - 10489
  • [19] Spatio-temporal analysis of urban crime leveraging multisource crowdsensed data
    Zhou B.
    Chen L.
    Zhao S.
    Zhou F.
    Li S.
    Pan G.
    Personal and Ubiquitous Computing, 2023, 27 (03) : 599 - 612
  • [20] Data-driven spatio-temporal analysis of consolidation for rapid reclamation
    Shi, Chao
    Wang, Yu
    GEOTECHNIQUE, 2023, 74 (07): : 676 - 696