Robust probabilistic PCA with missing data and contribution analysis for outlier detection

被引:99
作者
Chen, Tao [1 ]
Martin, Elaine [2 ]
Montague, Gary [2 ]
机构
[1] Nanyang Technol Univ, Sch Chem & Biomed Engn, Singapore 637459, Singapore
[2] Univ Newcastle, Sch Chem Engn & Adv Mat, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
关键词
PRINCIPAL COMPONENTS; COVARIANCE; IDENTIFICATION; MATRIX;
D O I
10.1016/j.csda.2009.03.014
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Principal component analysis (PCA) is a widely adopted multivariate data analysis technique, with interpretation being established on the basis of both classical linear projection and a probability model (i.e. probabilistic PCA (PPCA)). Recently robust PPCA models, by using the multivariate t-distribution, have been proposed to consider the situation where there may be outliers within the data set. This paper presents an overview of the robust PPCA technique, and further discusses the issue of missing data. An expectation-maximization (EM) algorithm is presented for the maximum likelihood estimation of the model parameters in the presence of missing data. When applying robust PPCA for outlier detection, a contribution analysis method is proposed to identify which variables contribute the most to the occurrence of outliers, providing valuable information regarding the source of outlying data. The proposed technique is demonstrated on numerical examples, and the application to outlier detection and diagnosis in an industrial fermentation process. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3706 / 3716
页数:11
相关论文
共 38 条
[21]  
Jolliffe I. T, 2002, PRINCIPAL COMPONENT
[22]  
KATZ S, 2004, MULTIVARIATE T DISTR
[23]   Process monitoring based on probabilistic PCA [J].
Kim, DS ;
Lee, IB .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2003, 67 (02) :109-123
[24]   ROBUST STATISTICAL MODELING USING THE T-DISTRIBUTION [J].
LANGE, KL ;
LITTLE, RJA ;
TAYLOR, JMG .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1989, 84 (408) :881-896
[25]   PROJECTION-PURSUIT APPROACH TO ROBUST DISPERSION MATRICES AND PRINCIPAL COMPONENTS - PRIMARY THEORY AND MONTE-CARLO [J].
LI, GY ;
CHEN, ZL .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1985, 80 (391) :759-766
[26]  
Little R. J. A., 1987, Statistical Analysis with Missing Data
[28]  
Miller P., 1998, Applied Mathematics and Computer Science, V8, P775
[29]   Robust mixture modelling using the t distribution [J].
Peel, D ;
McLachlan, GJ .
STATISTICS AND COMPUTING, 2000, 10 (04) :339-348
[30]   Statistical process monitoring: basics and beyond [J].
Qin, SJ .
JOURNAL OF CHEMOMETRICS, 2003, 17 (8-9) :480-502