Real-Time Anomaly Detection of NoSQL Systems Based on Resource Usage Monitoring

被引：15

作者：

Chouliaras, Spyridon ^{[1
]}

Sotiriadis, Stelios ^{[1
]}

机构：

[1] Birkbeck Univ London, Dept Comp Sci & Informat Syst, London WC1E 7HX, England

来源：

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS | 2020年 / 16卷 / 09期

关键词：

Monitoring; Radar; Real-time systems; Informatics; Throughput; Anomaly detection; Stress; cloud computing; not only SQL (NoSQL) systems; real-time analytics;

D O I：

10.1109/TII.2019.2958606

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Today, the emergence of the industry revolution systems such as Industry 4.0, Internet of Things, and big data frameworks poses new challenges in terms of storage and processing of real-time data. As systems scale in humongous sizes, a crucial task is to administer the variety of different subsystems and applications to ensure high performance. This is directly related with the identification and elimination of system failures and errors, while the system runs. In particular, database systems may experience abnormalities related with decreased throughput or increased resource usage, that in turn affects system performance. In this article, we focus on not only SQL (NoSQL) database systems that are ideal for storing sensor data in the concept of Industry 4.0. This typically includes a variety of applications and workloads that are difficult to online monitor, thus making anomaly detection a challenging task. Creating a robust platform to serve such infrastructures with minimum hardware or software failures is a key challenge. In this article, we propose RADAR, an anomaly detection system that works on real time. RADAR is a data-driven decision-making system for NoSQL systems, by providing process information extraction during resource monitoring and by associating resource usage with the top processes, to identify anomalous cases. In this article, we focus on anomalies such as hardware failures or software bugs that could lead to abnormal application runs, without necessarily stopping system functionality, e.g., due to a system crash, but by affecting its performance, e.g., decreased database system throughput. Although different patterns may occur through time, we focus on periodic running workloads (e.g., monitoring daily usage) that are very common for NoSQL systems, and Internet of Things scenarios where data streams are forwarded to the Cloud for storage and processing. We apply various machine learning algorithms such as autoregressive integrated moving average (ARIMA), seasonal ARIMA, and long-short-term memory recurrent neural networks. We experimentally analyze our solution to demonstrate the benefits of supporting online erroneous state identification and characterization for modern applications.

引用

页码：6042 / 6049

页数：8

共 18 条

[1]

[Anonymous], PRINCIPAL COMPONENT, DOI [DOI 10.1007/978-3-642-04898-2_525, DOI 10.1007/978-3-642-04898-2455]

[2]

[Anonymous], 2001, WORKSH COMP OP SYST

[3]

[Anonymous], 2017, ABS170407706 CORR

[4]

Beloglazov Anton, 2010, Proceedings 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), P826, DOI 10.1109/CCGRID.2010.46

[5] Online Phase Detection and Characterization of Cloud Applications [J].

Bhattacharyya, Arnamoy ;

Sotiriadis, Stelios ;

Amza, Cristiana .

2017 9TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2017, :98-105

[6]

Bhattacharyya A, 2016, INT CONF CLOUD COMP, P134, DOI [10.1109/CloudCom.2016.32, 10.1109/CloudCom.2016.0035]

[7] LAG ORDER AND CRITICAL-VALUES OF THE AUGMENTED DICKEY-FULLER TEST [J].

CHEUNG, YW ;

LAI, KS .

JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 1995, 13 (03) :277-280

[8]

Cooper B. F., 2010, P 1 ACM S CLOUD COMP, P143, DOI DOI 10.1145/1807128.1807152

[9]

Garfinkel T., 2003, P NETW DISTR SYST SE, V3, P191

[10] NoSQL Systems for Big Data Management [J].

Gudivada, Venkat N. ;

Rao, Dhana ;

Raghavan, Vijay V. .

2014 IEEE WORLD CONGRESS ON SERVICES (SERVICES), 2014, :190-197

← 1 2 →