Factor analysis of mixed data for anomaly detection

被引:3
作者
Davidow, Matthew [1 ]
Matteson, David S. [1 ]
机构
[1] Cornell Univ, Ctr Appl Math, Ithaca, NY 14853 USA
基金
美国国家科学基金会;
关键词
anomaly detection; Factor Analysis Of Mixed Data; mixed data; outlier detection; principal component analysis; subspace selection; ROBUST PCA;
D O I
10.1002/sam.11585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Anomaly detection aims to identify observations that deviate from the typical pattern of data. Anomalous observations may correspond to financial fraud, health risks, or incorrectly measured data in practice. We focus on unsupervised detection and the continuous and categorical (mixed) variable case. We show that detecting anomalies in mixed data is enhanced through first embedding the data then assessing an anomaly scoring scheme. We propose a kurtosis-weighted Factor Analysis of Mixed Data for anomaly detection to obtain a continuous embedding for anomaly scoring. We illustrate that anomalies are highly separable in the first and last few ordered dimensions of this space, and test various anomaly scoring experiments within this subspace. Results are illustrated for both simulated and real datasets, and the proposed approach is highly accurate for mixed data throughout these diverse scenarios.
引用
收藏
页码:480 / 493
页数:14
相关论文
共 32 条
[1]  
Aggarwal Charu C, 2017, OUTLIER ANAL, VSecond
[2]   A survey of anomaly detection techniques in financial domain [J].
Ahmed, Mohiuddin ;
Mahmood, Abdun Naser ;
Islam, Md. Rafiqul .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 55 :278-288
[3]  
Aryal Sunil, 2016, Intelligence and Security Informatics. 11th Pacific Asia Workshop, PAISI 2016. Proceedings: LNCS 9650, P73, DOI 10.1007/978-3-319-31863-9_6
[4]   Efficient Anomaly Detection by Isolation Using Nearest Neighbour Ensemble [J].
Bandaragoda, Tharindu R. ;
Ting, Kai Ming ;
Albrecht, David ;
Liu, Fei Tony ;
Wells, Jonathan R. .
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, :698-705
[6]  
Breunig MM, 1999, LECT NOTES ARTIF INT, V1704, P262
[7]   Robust Principal Component Analysis? [J].
Candes, Emmanuel J. ;
Li, Xiaodong ;
Ma, Yi ;
Wright, John .
JOURNAL OF THE ACM, 2011, 58 (03)
[8]   Diffusion maps [J].
Coifman, Ronald R. ;
Lafon, Stephane .
APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2006, 21 (01) :5-30
[9]   A Matrix-Free Likelihood Method for Exploratory Factor Analysis of High-Dimensional Gaussian Data [J].
Dai, Fan ;
Dutta, Somak ;
Maitra, Ranjan .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (03) :675-680
[10]   Energy-based anomaly detection for mixed data [J].
Do, Kien ;
Truyen Tran ;
Venkatesh, Svetha .
KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 57 (02) :413-435