Detecting performance anomalies in scientific workflows using hierarchical temporal memory

被引:21
作者
Rodriguez, Maria A. [1 ]
Kotagiri, Ramamohanarao [1 ]
Buyya, Rajkumar [1 ]
机构
[1] Univ Melbourne, Sch Comp & Informat Syst, Cloud Comp & Distributed Syst CLOUDS Lab, Melbourne, Vic, Australia
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2018年 / 88卷
关键词
Online anomaly detection; Scientific workflow; Hierarchical temporal memory; Performance anomalies;
D O I
10.1016/j.future.2018.05.014
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Technological advances and the emergence of the Internet of Things have lead to the collection of vast amounts of scientific data from increasingly powerful scientific instruments and a growing number of distributed sensors. This has not only exacerbated the significance of the analyses performed by scientific applications but has also increased their complexity and scale. Hence, emerging extreme-scale scientific workflows are becoming widespread and so is the need to efficiently automate their deployment on a variety of platforms such as high performance computers, dedicated clusters, and cloud environments. Performance anomalies can considerably affect the execution of these applications. They may be caused by different factors including failures and resource contention and they may lead to undesired circumstances such as lengthy delays in the workflow runtime or unnecessary costs in cloud environments. As a result, it is essential for modern workflow management systems to enable the early detection of this type of anomalies, to identify their cause, and to formulate and execute actions to mitigate their effects. In this work, we propose the use of Hierarchical Temporal Memory (HTM) to detect performance anomalies on real-time infrastructure metrics collected by continuously monitoring the resource consumption of executing workflow tasks. The framework is capable of processing a stream of measurements in an online and unsupervised manner and is successful in adapting to changes in the underlying statistics of the data. This allows it to be easily deployed on a variety of infrastructure platforms without the need of previously collecting data and training a model. We evaluate our approach by using two real scientific workflows deployed in Microsoft Azure's cloud infrastructure. Our experiment results demonstrate the ability of our model to accurately capture performance anomalies on different resource consumption metrics caused by a variety of competing workloads introduced into the system. A performance comparison of HTM to other online anomaly detection algorithms is also presented, demonstrating the suitability of the chosen algorithm for the problem presented in this work. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:624 / 635
页数:12
相关论文
共 38 条
[1]  
Ahmad S., 2015, Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory
[2]   Probabilistic anomaly detection in natural gas time series data [J].
Akouemo, Hermine N. ;
Povinelli, Richard J. .
INTERNATIONAL JOURNAL OF FORECASTING, 2016, 32 (03) :948-956
[3]  
[Anonymous], 2006, 2006 10 IEEE SINGAPO, DOI DOI 10.1109/ICCS.2006.301508
[4]  
[Anonymous], 2017, NEUROCOMPUTING
[5]  
[Anonymous], 2016, ARXIV160205925
[6]  
[Anonymous], 2 INT WORKSH GRID CO
[7]  
[Anonymous], ARXIV07103742
[8]  
Berger Victor., 2017, Anomaly detection in user behavior of websites using hierarchical temporal memories: Using machine learning to detect unusual behavior from users of a web service to quickly detect possible security hazards
[9]   Outlier detection in regression models with ARIMA errors using robust estimates [J].
Bianco, AM ;
Ben, MG ;
Martínez, EJ ;
Yohai, VJ .
JOURNAL OF FORECASTING, 2001, 20 (08) :565-579
[10]  
Bonhoff G. M., 2008, TECH REP