HybridTune: Spatio-Temporal Performance Data Correlation for Performance Diagnosis of Big Data Systems

被引:0
|
作者
Rui Ren
Jiechao Cheng
Xi-Wen He
Lei Wang
Jian-Feng Zhan
Wan-Ling Gao
Chun-Jie Luo
机构
[1] Chinese Academy of Sciences,Institute of Computing Technology
[2] University of Chinese Academy of Sciences,School of Computing
[3] National University of Singapore,undefined
来源
Journal of Computer Science and Technology | 2019年 / 34卷
关键词
Big Data system; spatio-temporal correlation; rule-based diagnosis; machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the Big Data systems. Currently, there are two major solutions. One is the pure data-driven diagnosis approach, which may be very time-consuming; the other is the rule-based analysis method, which usually requires prior knowledge. For Big Data applications like Spark workloads, we observe that the tasks in the same stages normally execute the same or similar codes on each data partition. On basis of the stage similarity and distributed characteristics of Big Data systems, we analyze the behaviors of the Big Data applications in terms of both system and micro-architectural metrics of each stage. Furthermore, for different performance problems, we propose a hybrid approach that combines prior rules and machine learning algorithms to detect performance anomalies, such as straggler tasks, task assignment imbalance, data skew, abnormal nodes and outlier metrics. Following this methodology, we design and implement a lightweight, extensible tool, named HybridTune, and measure the overhead and anomaly detection effectiveness of HybridTune using the BigDataBench benchmarks. Our experiments show that the overhead of HybridTune is only 5%, and the accuracy of outlier detection algorithm reaches up to 93%. Finally, we report several use cases diagnosing Spark and Hadoop workloads using BigDataBench, which demonstrates the potential use of HybridTune.
引用
收藏
页码:1167 / 1184
页数:17
相关论文
共 50 条
  • [1] HybridTune: Spatio-Temporal Performance Data Correlation for Performance Diagnosis of Big Data Systems
    Ren, Rui
    Cheng, Jiechao
    He, Xi-Wen
    Wang, Lei
    Zhan, Jian-Feng
    Gao, Wan-Ling
    Luo, Chun-Jie
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2019, 34 (06) : 1167 - 1184
  • [2] Towards High Performance Spatio-temporal Data Management Systems
    Ray, Suprio
    2014 IEEE 15TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (IEEE MDM), VOL 2, 2014, : 19 - 22
  • [3] Expanding ParaSQL for spatio-temporal (big) data
    Sugam Sharma
    Shashi Gadia
    The Journal of Supercomputing, 2019, 75 : 587 - 606
  • [4] Expanding ParaSQL for spatio-temporal (big) data
    Sharma, Sugam
    Gadia, Shashi
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (02): : 587 - 606
  • [5] Cartography in the Age of Spatio-temporal Big Data
    Wang J.
    2017, SinoMaps Press (46): : 1226 - 1237
  • [6] Big spatio-temporal data mining for emergency management information systems
    Dagaeva, Maria
    Garaeva, Alina
    Anikin, Igor
    Makhmutova, Alisa
    Minnikhanov, Rifkat
    IET INTELLIGENT TRANSPORT SYSTEMS, 2019, 13 (11) : 1649 - 1657
  • [7] RESEARCH ON NATURAL RESOURCES SPATIO-TEMPORAL BIG DATA ANALYSIS PLATFORM FOR HIGH PERFORMANCE COMPUTING
    Gao, Yin
    Lui, Jianwei
    Liu, Yang
    Zhai, Zhaokun
    Che, Jian
    Li, Hao
    Wang, Ru
    Liu, Jianjun
    39TH INTERNATIONAL SYMPOSIUM ON REMOTE SENSING OF ENVIRONMENT ISRSE-39 FROM HUMAN NEEDS TO SDGS, VOL. 48-M-1, 2023, : 115 - 121
  • [8] Distributed processing of big mobility data as spatio-temporal data streams
    Zdravko Galić
    Emir Mešković
    Dario Osmanović
    GeoInformatica, 2017, 21 : 263 - 291
  • [9] Distributed processing of big mobility data as spatio-temporal data streams
    Galic, Zdravko
    Meskovic, Emir
    Osmanovic, Dario
    GEOINFORMATICA, 2017, 21 (02) : 263 - 291
  • [10] Spaten: a Spatio-temporal and Textual Big Data Generator
    Doudali, Thaleia Dimitra
    Konstantinou, Ioannis
    Koziris, Nectarios
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3416 - 3421