Little Help Makes a Big Difference: Leveraging Active Learning to Improve Unsupervised Time Series Anomaly Detection

被引:2
作者
Bodor, Hamza [1 ,2 ]
Hoang, Thai, V [1 ]
Zhang, Zonghua [1 ]
机构
[1] Huawei Technol France, Paris Res Ctr, Boulogne, France
[2] Ecole Ponts ParisTech, Marne La Vallee, France
来源
SERVICE-ORIENTED COMPUTING, ICSOC 2021 WORKSHOPS | 2022年 / 13236卷
关键词
Active learning; Anomaly detection; Time series data;
D O I
10.1007/978-3-031-14135-5_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Key Performance Indicators (KPI), which are essentially time series data, have been widely used to indicate the performance of tele-com networks. Based on the given KPIs, a large set of anomaly detection algorithms have been deployed for detecting the unexpected network incidents. Generally, unsupervised anomaly detection algorithms gain more popularity than the supervised ones, due to the fact that labeling KPIs is extremely time- and resource-consuming, and error-prone. However, those unsupervised anomaly detection algorithms often suffer from excessive false alarms, especially in the presence of concept drifts resulting from network re-configurations or maintenance. To tackle this challenge and improve the overall performance of unsupervised anomaly detection algorithms, we propose to use active learning to introduce and benefit from the feedback of operators, who can verify the alarms (both false and true ones) and label the corresponding KPIs with reasonable effort. Specifically, we develop three query strategies to select the most informative and representative samples to label. We also develop an efficient method to update the weights of Isolation Forest and optimally adjust the decision threshold, so as to eventually improve the performance of detection model. The experiments with one public dataset and one proprietary dataset demonstrate that our active learning empowered anomaly detection pipeline could achieve performance gain, in terms of F1-score, more than 50% over the baseline algorithm. It also outperforms the existing active learning based methods by approximately 6%-10%, with significantly reduced budget (the ratio of samples to be labeled).
引用
收藏
页码:165 / 176
页数:12
相关论文
共 22 条
  • [1] Bartos M.D., 2019, J OPEN SOURCE SOFTW, V4, P1336, DOI [DOI 10.21105/JOSS.01336, 10.21105/JOSS.01336]
  • [2] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [3] Carmona CU, 2021, Arxiv, DOI arXiv:2107.07702
  • [4] Das S, 2024, Arxiv, DOI arXiv:1901.08930
  • [5] Toward Supervised Anomaly Detection
    Goernitz, Nico
    Kloft, Marius
    Rieck, Konrad
    Brefeld, Ulf
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 46 : 235 - 262
  • [6] Guha S, 2016, PR MACH LEARN RES, V48
  • [7] Theory of Disagreement-Based Active Learning
    Hanneke, Steve
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2014, 7 (2-3): : 131 - 309
  • [8] Linkedin, LUM AN DET CORR LIBR
  • [9] Opprentice: Towards Practical and Automatic Anomaly Detection Through Machine Learning
    Liu, Dapeng
    Zhao, Youjian
    Xu, Haowen
    Sun, Yongqian
    Pei, Dan
    Luo, Jiao
    Jing, Xiaowei
    Feng, Mei
    [J]. IMC'15: PROCEEDINGS OF THE 2015 ACM CONFERENCE ON INTERNET MEASUREMENT CONFERENCE, 2015, : 211 - 224
  • [10] Isolation Forest
    Liu, Fei Tony
    Ting, Kai Ming
    Zhou, Zhi-Hua
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 413 - +