Robust probabilistic PCA with missing data and contribution analysis for outlier detection

被引:95
作者
Chen, Tao [1 ]
Martin, Elaine [2 ]
Montague, Gary [2 ]
机构
[1] Nanyang Technol Univ, Sch Chem & Biomed Engn, Singapore 637459, Singapore
[2] Univ Newcastle, Sch Chem Engn & Adv Mat, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
关键词
PRINCIPAL COMPONENTS; COVARIANCE; IDENTIFICATION; MATRIX;
D O I
10.1016/j.csda.2009.03.014
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Principal component analysis (PCA) is a widely adopted multivariate data analysis technique, with interpretation being established on the basis of both classical linear projection and a probability model (i.e. probabilistic PCA (PPCA)). Recently robust PPCA models, by using the multivariate t-distribution, have been proposed to consider the situation where there may be outliers within the data set. This paper presents an overview of the robust PPCA technique, and further discusses the issue of missing data. An expectation-maximization (EM) algorithm is presented for the maximum likelihood estimation of the model parameters in the presence of missing data. When applying robust PPCA for outlier detection, a contribution analysis method is proposed to identify which variables contribute the most to the occurrence of outliers, providing valuable information regarding the source of outlying data. The proposed technique is demonstrated on numerical examples, and the application to outlier detection and diagnosis in an industrial fermentation process. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3706 / 3716
页数:11
相关论文
共 38 条
  • [1] Archambeau C., 2006, P 23 INT C MACH LEAR
  • [2] Atkinson A.C., 2004, SPR S STAT
  • [3] Barnett V., 1994, Wiley series in probability and mathematical statistics applied probability and statistics, P224
  • [4] BASABE XL, 2004, THESIS U NEWCASTLE T
  • [5] Campbell N. A., 1980, Applied Statistics, V29, P231, DOI 10.2307/2346896
  • [6] Dynamic data rectification using particle filters
    Chen, Tao
    Morris, Julian
    Martin, Elaine
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2008, 32 (03) : 451 - 462
  • [7] Probability density estimation via an infinite Gaussian mixture model: application to statistical process monitoring
    Chen, Tao
    Morris, Julian
    Martin, Elaine
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2006, 55 : 699 - 715
  • [8] Probabilistic contribution analysis for statistical process monitoring: A missing variable approach
    Chen, Tao
    Sun, Yue
    [J]. CONTROL ENGINEERING PRACTICE, 2009, 17 (04) : 469 - 477
  • [9] Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies
    Croux, C
    Haesbroeck, G
    [J]. BIOMETRIKA, 2000, 87 (03) : 603 - 618
  • [10] Robust statistics in data analysis - A review basic concepts
    Daszykowski, M.
    Kaczmarek, K.
    Heyden, Y. Vander
    Walczak, B.
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2007, 85 (02) : 203 - 219