A multivariate extreme value theory approach to anomaly clustering and visualization

被引:0
作者
Maël Chiapino
Stephan Clémençon
Vincent Feuillard
Anne Sabourin
机构
[1] LTCI,
[2] Télécom Paris,undefined
[3] Institut polytechnique de Paris,undefined
[4] Airbus Central R&T,undefined
[5] AI Research,undefined
来源
Computational Statistics | 2020年 / 35卷
关键词
Anomaly detection; Clustering; Graph-mining; Latent variable analysis; Mixture modelling; Multivariate extreme value theory; Visualization;
D O I
暂无
中图分类号
学科分类号
摘要
In a wide variety of situations, anomalies in the behaviour of a complex system, whose health is monitored through the observation of a random vector X=(X1,…,Xd)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{X }=(X_1,\; \ldots ,\; X_d)$$\end{document} valued in Rd\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {R}^d$$\end{document}, correspond to the simultaneous occurrence of extreme values for certain subgroups α⊂{1,…,d}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha \subset \{1,\; \ldots ,\; d \}$$\end{document} of variables Xj\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j$$\end{document}. Under the heavy-tail assumption, which is precisely appropriate for modeling these phenomena, statistical methods relying on multivariate extreme value theory have been developed in the past few years for identifying such events/subgroups. This paper exploits this approach much further by means of a novel mixture model that permits to describe the distribution of extremal observations and where the anomaly type α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} is viewed as a latent variable. One may then take advantage of the model by assigning to any extreme point a posterior probability for each anomaly type α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}, defining implicitly a similarity measure between anomalies. It is explained at length how the latter permits to cluster extreme observations and obtain an informative planar representation of anomalies using standard graph-mining tools. The relevance and usefulness of the clustering and 2-d visual display thus designed is illustrated on simulated datasets and on real observations as well, in the aeronautics application domain.
引用
收藏
页码:607 / 628
页数:21
相关论文
共 50 条
[31]   Multiscale and Multivariate Time Series Clustering: A New Approach [J].
Tokotoko, Jannai ;
Govan, Rodrigue ;
Lemonnier, Hugues ;
Selmaoui-Folcher, Nazha .
FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2022), 2022, 13515 :283-293
[32]   An agglomerative hierarchical approach to visualization in Bayesian clustering problems [J].
Dawson, K. J. ;
Belkhir, K. .
HEREDITY, 2009, 103 (01) :32-45
[33]   An agglomerative hierarchical approach to visualization in Bayesian clustering problems [J].
K J Dawson ;
K Belkhir .
Heredity, 2009, 103 :32-45
[34]   Manifold Learning-based Clustering Approach Applied to Anomaly Detection in Surveillance Videos [J].
Lopes, Leonardo Tadeu ;
Valem, Lucas Pascotti ;
Guimaraes Pedronette, Daniel Carlos ;
Guilherme, Ivan Rizzo ;
Papa, Joao Paulo ;
Silva Santana, Marcos Cleison ;
Colombo, Danilo .
PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, :404-412
[35]   An Approach For Verifying And Validating Clustering Based Anomaly Detection Systems Using Metamorphic Testing [J].
Rehman, Faqeer Ur ;
Izurieta, Clemente .
2022 FOURTH IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING (AITEST 2022), 2022, :12-18
[36]   Self-consistent estimation of conditional multivariate extreme value distributions [J].
Liu, Y. ;
Tawn, J. A. .
JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 127 :19-35
[37]   Modeling Concurrent Hydroclimatic Extremes With Parametric Multivariate Extreme Value Models [J].
Sharma, Shailza ;
Mujumdar, P. P. .
WATER RESOURCES RESEARCH, 2022, 58 (02)
[38]   Juniper: A Tree plus Table Approach to Multivariate Graph Visualization [J].
Nobre, Carolina ;
Streit, Marc ;
Lex, Alexander .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2019, 25 (01) :544-554
[39]   A Geometric Approach to Clustering Based Anomaly Detection for Industrial Applications [J].
Li, Peng ;
Niggemann, Oliver ;
Hammer, Barbara .
IECON 2018 - 44TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2018, :5345-5352
[40]   Revisiting the hybrid approach of anomaly detection and extreme value theory for estimating pedestrian crashes using traffic conflicts obtained from artificial intelligence-based video analytics [J].
Hussain, Fizza ;
Ali, Yasir ;
Li, Yuefeng ;
Haque, Md Mazharul .
ACCIDENT ANALYSIS AND PREVENTION, 2024, 199