Energy-based anomaly detection for mixed data

被引：9

作者：

Do, Kien ^{[1
]}

Truyen Tran ^{[1
]}

Venkatesh, Svetha ^{[1
]}

机构：

[1] Deakin Univ, Appl AI Inst, 75 Pigdons Rd, Waurn Ponds, Vic 3216, Australia

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2018年 / 57卷 / 02期

关键词：

Mixed data; Mixed-variate restricted Boltzmann machine; Deep belief net; Multilevel anomaly detection; OUTLIER DETECTION APPROACH;

D O I：

10.1007/s10115-018-1168-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Anomalies are those deviating significantly from the norm. Thus, anomaly detection amounts to finding data points located far away from their neighbors, i.e., those lying in low-density regions. Classic anomaly detection methods are largely designed for single data type such as continuous or discrete. However, real-world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Mixed data poses multiple challenges including (a) capturing the inter-type correlation structures and (b) measuring deviation from the norm under multiple types. These challenges are exaggerated under (c) high-dimensional regimes. In this paper, we propose a new scalable unsupervised anomaly detection method for mixed data based on Mixed-variate Restricted Boltzmann Machine (Mv. RBM). The Mv. RBM is a principled probabilistic method that estimates density of mixed data. We propose to use free energy derived from Mv. RBM as anomaly score as it is identical to data negative log-density up to an additive constant. We then extend this method to detect anomalies across multiple levels of data abstraction, an effective approach to deal with high-dimensional settings. The extension is dubbed MIXMAD, which stands for MIXed data Multilevel Anomaly Detection. In MIXMAD, we sequentially construct an ensemble of mixed-data Deep Belief Nets (DBNs) with varying depths. Each DBN is an energy-based detector at a predefined abstraction level. Predictions across the ensemble are finally combined via a simple rank aggregation method. The proposed methods are evaluated on a comprehensive suit of synthetic and real high-dimensional datasets. The results demonstrate that for anomaly detection, (a) a proper handling of mixed types is necessary, (b) free energy is a powerful anomaly scoring method, (c) multilevel abstraction of data is important for high-dimensional data, and (d) empirically Mv. RBM and MIXMAD are superior to popular unsupervised detection methods for both homogeneous and mixed data.

引用

页码：413 / 435

页数：23

共 51 条

[1]

Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420

[2]

Angiulli F., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P15

[3]

[Anonymous], 2012, CIKM. ACM, DOI [10.1145/2396761.2396816, 10.1145/2396761]

[4]

[Anonymous], P 3 AS C MACH LEARN

[5]

[Anonymous], 2015, ACM SIGKDD explorations newsletter, DOI [DOI 10.1145/2830544.2830549, 10.1145/2830544.2830549]

[6]

[Anonymous], INT C MACH LEARN ICM

[7]

Becker J., 2015, SPIE DEFENSE SECURIT

[8] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[9] Collective Anomaly Detection Based on Long Short-Term Memory Recurrent Neural Networks [J].

Bontemps, Loic ;

Van Loi Cao ;

McDermott, James ;

Nhien-An Le-Khac .

FUTURE DATA AND SECURITY ENGINEERING, FDSE 2016, 2016, 10018 :141-152

[10] A practical outlier detection approach for mixed-attribute data [J].

Bouguessa, Mohamed .

EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) :8637-8649

← 1 2 3 4 5 6 →