Real-Time Anomaly Detection for Large-Scale Network Devices

被引:0
作者
Tao, Lei [1 ]
Ma, Minghua [2 ]
Zhang, Shenglin [1 ,3 ]
Kuang, Junhua
Guo, Xiao-Wei [4 ]
Yang, Canqun [4 ,5 ]
Pei, Dan [6 ,7 ]
机构
[1] Nankai Univ, Coll Software, Tianjin 300192, Peoples R China
[2] Microsoft, Redmond, WA 98052 USA
[3] Haihe Lab Informat Technol Applicat Innovat HL IT, Tianjin 300459, Peoples R China
[4] Natl Univ Def Technol, Coll Comp Sci, Changsha 410073, Peoples R China
[5] Natl Supercomp Ctr Tianjin, Tianjin 300456, Peoples R China
[6] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100190, Peoples R China
[7] Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100190, Peoples R China
来源
IEEE TRANSACTIONS ON NETWORKING | 2025年 / 33卷 / 03期
基金
中国国家自然科学基金;
关键词
Anomaly detection; Time series analysis; Measurement; Software; Monitoring; Hardware; Computational efficiency; Training data; Training; Real-time systems; network devices; multivariate time series; dynamic mode decomposition; DYNAMIC-MODE DECOMPOSITION;
D O I
10.1109/TON.2025.3529861
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the booming of large-scale network devices, anomaly detection on multivariate time series (MTS), such as a combination of CPU utilization, average response time, and network packet loss, is important for system reliability. Although a collection of learning-based approaches have been designed for this purpose, our study shows that these approaches suffer from long initialization time for sufficient training data. Our previously proposed JumpStarter model stands as a MTS anomaly detection method characterized by its brief initialization time and commendable detection performance. However, it suffers from high computational cost and inappropriateness for periodic MTS. In this paper, we propose VersaGuardian, which introduces the Dynamic Mode Decomposition technique to MTS anomaly detection for diverse types of MTS in a rapidly initialized, computationally efficient manner. With real-world MTS datasets collected from three companies, our results show that VersaGuardian achieves an average F1 score of 94.42%, significantly outperforming the popular anomaly detection algorithms, with a much shorter initialization time of 20 minutes and detection time of 15.28 milliseconds.
引用
收藏
页码:1326 / 1337
页数:12
相关论文
共 42 条
[1]   USAD : UnSupervised Anomaly Detection on Multivariate Time Series [J].
Audibert, Julien ;
Michiardi, Pietro ;
Guyard, Frederic ;
Marti, Sebastien ;
Zuluaga, Maria A. .
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :3395-3404
[2]   Dynamic Mode Decomposition for Compressive System Identification [J].
Bai, Zhe ;
Kaiser, Eurika ;
Proctor, Joshua L. ;
Kutz, J. Nathan ;
Brunton, Steven L. .
AIAA JOURNAL, 2020, 58 (02) :561-574
[3]  
Bracewell R., 2000, The Fourier Transform and Its Applications, V3st
[4]  
Chen Y., 2023, arXiv
[5]   Identifying Linked Incidents in Large-Scale Online Service Systems [J].
Chen, Yujun ;
Yang, Xian ;
Dong, Hang ;
He, Xiaoting ;
Zhang, Hongyu ;
Lin, Qingwei ;
Chen, Junjie ;
Zhao, Pu ;
Kang, Yu ;
Gao, Feng ;
Xu, Zhangwei ;
Zhang, Dongmei .
PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20), 2020, :304-314
[6]   Crowdsourcing Service-Level Network Event Monitoring [J].
Choffnes, David R. ;
Bustamante, Fabian E. ;
Ge, Zihui .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2010, 40 (04) :387-398
[7]  
Cleveland R.B., 1990, J OFFICIAL STAT, V6, P3
[8]  
Deng AL, 2021, AAAI CONF ARTIF INTE, V35, P4027
[9]   Detection Is Better Than Cure: A Cloud Incidents Perspective [J].
Ganatra, Vaibhav ;
Parayil, Anjaly ;
Ghosh, Supriyo ;
Kang, Yu ;
Ma, Minghua ;
Bansal, Chetan ;
Nath, Suman ;
Mace, Jonathan .
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, :1891-1902
[10]  
Guha S, 2016, PR MACH LEARN RES, V48