AIDA: Analytic isolation and distance-based anomaly detection algorithm

被引:12
作者
Arias, Luis Antonio Souto [1 ]
Oosterlee, Cornelis W. [1 ]
Cirillo, Pasquale [2 ]
机构
[1] Univ Utrecht, Math Inst, Budapestlaan 6, NL-3584 CD Utrecht, Netherlands
[2] Zurich Univ Appl Sci, ZHAW Sch Management & Law, Theaterstr 17, CH-8401 Winterthur, Switzerland
关键词
Outlier detection; Anomaly explanation; Isolation; Distance; Ensemble methods; OPTIMIZATION;
D O I
10.1016/j.patcog.2023.109607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many unsupervised anomaly detection algorithms rely on the concept of nearest neighbours to compute the anomaly scores. Such algorithms are popular because there are no assumptions about the data, making them a robust choice for unstructured datasets. However, the number ( k ) of nearest neighbours, which critically affects the model performance, cannot be tuned in an unsupervised setting. Hence, we propose the new and parameter-free Analytic Isolation and Distance-based Anomaly (AIDA) detection algorithm, that combines the metrics of distance with isolation. Based on AIDA, we also introduce the Tempered Isolation-based eXplanation (TIX) algorithm, which identifies the most relevant features characterizing an outlier, even in large multi-dimensional datasets, improving the overall explainability of the detection mechanism. Both AIDA and TIX are thoroughly tested and compared with state-of-the-art alternatives, proving to be useful additions to the existing set of tools in anomaly detection. (c) 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
引用
收藏
页数:15
相关论文
共 36 条
  • [1] AARTS EHL, 1985, PHILIPS J RES, V40, P193
  • [2] Aggarwal C. C., 2017, OUTLIER ANAL
  • [3] Aggarwal C. C., 2015, ACM SIGKDD EXPLORATI, V17, P24, DOI [DOI 10.1145/2830544.2830549, 10.1145/2830544.2830549]
  • [4] Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
  • [5] Agresti A., 2013, CATEGORICAL DATA ANA
  • [6] Angiulli F., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P15
  • [7] [Anonymous], 2008, P 8 SIAM INT C DAT M, DOI [DOI 10.1137/1.9781611972788.22, 10.1137/1.9781611972788.22]
  • [8] Isolation-based anomaly detection using nearest-neighbor ensembles
    Bandaragoda, Tharindu R.
    Ting, Kai Ming
    Albrecht, David
    Liu, Fei Tony
    Zhu, Ye
    Wells, Jonathan R.
    [J]. COMPUTATIONAL INTELLIGENCE, 2018, 34 (04) : 968 - 998
  • [9] LOF: Identifying density-based local outliers
    Breunig, MM
    Kriegel, HP
    Ng, RT
    Sander, J
    [J]. SIGMOD RECORD, 2000, 29 (02) : 93 - 104
  • [10] Calin O, 2020, SPRINGER SER DATA SC, P1, DOI 10.1007/978-3-030-36721-3